[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041238#comment-15041238
 ] 

Hadoop QA commented on YARN-4340:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 56s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 56s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 19m 31s 
{color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new 
issues (was 751, now 751). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 28m 56s 
{color} | {color:red} root-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new 
issues (was 745, now 745). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 15s 
{color} | {color:red} Patch generated 32 new checkstyle issues in root (total 
was 352, now 382). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 37s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
introduced 4 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 20s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 
generated 9 new issues (was 100, now 100). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 49s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_85 with JDK v1.7.0_85 
generated 5 new issues (was 0, now 5). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{col

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041235#comment-15041235
 ] 

Hadoop QA commented on YARN-4411:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} YARN-4411 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775739/YARN-4411.001.patch |
| JIRA Issue | YARN-4411 |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9858/console |


This message was automatically generated.



> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: yarntime
> Attachments: YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041201#comment-15041201
 ] 

Hadoop QA commented on YARN-4002:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 8s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775729/YARN-4002-rwlock.p

[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2015-12-03 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041176#comment-15041176
 ] 

Konstantinos Karanasos commented on YARN-2885:
--

[~leftnoteasy], I think the Distributed Scheduler AM Service makes sense.

Given that we will already add the Distributed Scheduling Coordinator in the RM 
(which will be used for the top-k technique, and later for the corrective 
mechanisms in YARN-2888), what about using the same service for delegating the 
AMProtocol wrapper (rather than creating an additional one)?

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2015-12-03 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041169#comment-15041169
 ] 

Konstantinos Karanasos commented on YARN-2877:
--

Hi [~wangda],

Thanks for pointing out HADOOP-11552. It seems it can also be used for the same 
purpose.
I would suggest to follow the technique of frequent AM-LocalRM heartbeats and 
less frequent LocalRM-RM heartbeats to start with. Once HADOOP-11552 gets 
resolved, we can consider using it.

bq. I think top-k node list technique cannot completely solve the over 
subscribe issue, in a production cluster, application comes in waves, it is 
possible that few large applications can exhaust all resources in a cluster 
within few seconds. Maybe another possible approach to mitigate the issue is: 
propagating queue-able containers from NM to RM periodically, so NM can still 
make decision but RM can also be aware of these queue-able containers.
As long as k is sufficiently big, the phenomenon you describe should not be 
very pronounced. 
Moreover, corrective mechanisms (YARN-2888) will lead to moving tasks from 
highly-loaded nodes to less busy ones.
Going further, what you are suggesting would also make sense.

> Extend YARN to support distributed scheduling
> -
>
> Key: YARN-2877
> URL: https://issues.apache.org/jira/browse/YARN-2877
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>Assignee: Konstantinos Karanasos
> Attachments: distributed-scheduling-design-doc_v1.pdf
>
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4411) ResourceManager IllegalArgumentException error

2015-12-03 Thread yarntime (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yarntime updated YARN-4411:
---
Attachment: YARN-4411.001.patch

A simple patch which replace 
  YarnApplicationAttemptState.valueOf(this.getState().toString())
with
  this.createApplicationAttemptState()

no t ests added.

> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: yarntime
> Attachments: YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041158#comment-15041158
 ] 

Hudson commented on YARN-4405:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #663 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/663/])
YARN-4405. Support node label store in non-appendable file system. (jianhe: rev 
755dda8dd8bb23864abc752bad506f223fcac010)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfigurationFieldsBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/DummyCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java


> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041156#comment-15041156
 ] 

Hudson commented on YARN-4292:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #663 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/663/])
YARN-4292. ResourceUtilization should be a part of NodeInfo REST API. (wangda: 
rev a2c3bfc8c1349102a7f2bc4ea96b80b429ac227b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceUtilizationInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java


> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, 
> 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039753#comment-15039753
 ] 

Sangjin Lee commented on YARN-4356:
---

I've run unit tests for the affected projects, and confirmed that there are no 
additional failures.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-03 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4356:
--
Attachment: YARN-4356-feature-YARN-2928.poc.001.patch

I'm posting a POC patch to get early feedback.

I haven't added a config that checks the version of the timeline service yet, 
and I need to sort out various configuration parameters a little more.

But assuming those things will be in place later on, please take a look at 
whether the timeline service v.2 code/memory/behavior is cleanly turned off if 
the timeline service v.2 is disabled. I would greatly appreciate your feedback. 
Thanks!

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4002:
--
Attachment: YARN-4002-rwlock.patch
YARN-4002-lockless-read.patch

2 patch for the 2 proposed solutions submitted.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: YARN-4002-lockless-read.patch, YARN-4002-rwlock.patch, 
> YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4417) Make RM and Timeline-server REST APIs more consistent

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039714#comment-15039714
 ] 

Hadoop QA commented on YARN-4417:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 48, now 51). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 46s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 53s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 148m 7s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourceman

[jira] [Commented] (YARN-4419) Trunk building failed

2015-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039708#comment-15039708
 ] 

Hudson commented on YARN-4419:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8918 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8918/])
Add missing file for YARN-4419 (jianhe: rev 
e84d6ca2df775bb4c93f6c08b345ac30b3a4525b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NonAppendableFSNodeLabelStore.java


> Trunk building failed
> -
>
> Key: YARN-4419
> URL: https://issues.apache.org/jira/browse/YARN-4419
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kai Zheng
>Assignee: Jian He
>
> Checking out the latest codes, mvn clean package -DskipTests failed as below.
> {noformat}
> [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
> hadoop-yarn-common ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 72 source files to 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes
> [INFO] -
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15]
>  cannot find symbol
>   symbol:   class NonAppendableFSNodeLabelStore
>   location: class 
> org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore
> [INFO] 1 error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039679#comment-15039679
 ] 

Sunil G commented on YARN-4405:
---

Thanks [~jianhe] and [~leftnoteasy]. Its fine now.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039677#comment-15039677
 ] 

Wangda Tan commented on YARN-4405:
--

Thanks [~jianhe], I was committing just now and found it is already fixed.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4419) Trunk building failed

2015-12-03 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-4419.
---
Resolution: Fixed

> Trunk building failed
> -
>
> Key: YARN-4419
> URL: https://issues.apache.org/jira/browse/YARN-4419
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kai Zheng
>Assignee: Jian He
>
> Checking out the latest codes, mvn clean package -DskipTests failed as below.
> {noformat}
> [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
> hadoop-yarn-common ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 72 source files to 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes
> [INFO] -
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15]
>  cannot find symbol
>   symbol:   class NonAppendableFSNodeLabelStore
>   location: class 
> org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore
> [INFO] 1 error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4419) Trunk building failed

2015-12-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039673#comment-15039673
 ] 

Jian He commented on YARN-4419:
---

sorry, missed a file while committing, fixed now.

> Trunk building failed
> -
>
> Key: YARN-4419
> URL: https://issues.apache.org/jira/browse/YARN-4419
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kai Zheng
>Assignee: Jian He
>
> Checking out the latest codes, mvn clean package -DskipTests failed as below.
> {noformat}
> [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
> hadoop-yarn-common ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 72 source files to 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes
> [INFO] -
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15]
>  cannot find symbol
>   symbol:   class NonAppendableFSNodeLabelStore
>   location: class 
> org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore
> [INFO] 1 error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039671#comment-15039671
 ] 

Jian He commented on YARN-4405:
---

sorry, missed a file while committing, fixed now.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039670#comment-15039670
 ] 

Wangda Tan commented on YARN-4405:
--

Thanks for reporting, [~drankye], [~sunilg].

Fixing this issue now.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: YARN-4340.v5.patch

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4419) Trunk building failed

2015-12-03 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-4419:
-

Assignee: Jian He

> Trunk building failed
> -
>
> Key: YARN-4419
> URL: https://issues.apache.org/jira/browse/YARN-4419
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kai Zheng
>Assignee: Jian He
>
> Checking out the latest codes, mvn clean package -DskipTests failed as below.
> {noformat}
> [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
> hadoop-yarn-common ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 72 source files to 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes
> [INFO] -
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15]
>  cannot find symbol
>   symbol:   class NonAppendableFSNodeLabelStore
>   location: class 
> org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore
> [INFO] 1 error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: (was: YARN-4340.v5.patch)

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Sean Po (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039663#comment-15039663
 ] 

Sean Po commented on YARN-4340:
---

I took all your suggestions above, and with regards to your second comment, I 
also removed 'user' from the interface in v5 of the patch.
The REST aspect of the code was also removed, and will be a part of a second 
ticket YARN-4420.

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Description: 
This JIRA tracks changes to the APIs of the reservation system, and enables 
querying the reservation system on which reservation exists by "time-range, 
reservation-id".

YARN-4420 has a dependency on this.

  was:
This JIRA tracks changes to the APIs of the reservation system, and enables 
querying the reservation system on which reservation exists by "time-range, 
reservation-id, username".


> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4420) Add REST API for List Reservations

2015-12-03 Thread Sean Po (JIRA)
Sean Po created YARN-4420:
-

 Summary: Add REST API for List Reservations
 Key: YARN-4420
 URL: https://issues.apache.org/jira/browse/YARN-4420
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, fairscheduler, resourcemanager
Reporter: Sean Po
Assignee: Sean Po
Priority: Minor


This JIRA tracks changes to the REST APIs of the reservation system and enables 
querying the reservation on which reservations exists by "time-range, and 
reservation-id". 

This task has a dependency on YARN-4340.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4420) Add REST API for List Reservations

2015-12-03 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4420:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2572

> Add REST API for List Reservations
> --
>
> Key: YARN-4420
> URL: https://issues.apache.org/jira/browse/YARN-4420
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, fairscheduler, resourcemanager
>Reporter: Sean Po
>Assignee: Sean Po
>Priority: Minor
>
> This JIRA tracks changes to the REST APIs of the reservation system and 
> enables querying the reservation on which reservations exists by "time-range, 
> and reservation-id". 
> This task has a dependency on YARN-4340.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039648#comment-15039648
 ] 

Sunil G commented on YARN-4405:
---

Yes, its a compile pblm
{noformat}
sb_work2/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15]
 cannot find symbol
[ERROR] symbol:   class NonAppendableFSNodeLabelStore
[ERROR] location: class 
org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore
[ERROR] -> [Help 1]
{noformat}

Usually such  issues will be handled with an addenum patch in the same ticket. 
So it could be handled here, [~leftnoteasy], could you please help for the same.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039628#comment-15039628
 ] 

Kai Zheng commented on YARN-4405:
-

YARN-4419 was just opened. Please help check and comment it. Thanks.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4419) Trunk building failed

2015-12-03 Thread Kai Zheng (JIRA)
Kai Zheng created YARN-4419:
---

 Summary: Trunk building failed
 Key: YARN-4419
 URL: https://issues.apache.org/jira/browse/YARN-4419
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Kai Zheng


Checking out the latest codes, mvn clean package -DskipTests failed as below.
{noformat}
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hadoop-yarn-common ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 72 source files to 
/home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes
[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15]
 cannot find symbol
  symbol:   class NonAppendableFSNodeLabelStore
  location: class 
org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore
[INFO] 1 error
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039614#comment-15039614
 ] 

Hudson commented on YARN-3840:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #662 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/662/])
YARN-3840. Resource Manager web ui issue when sorting application by id 
(jianhe: rev 9f77ccad735f4843ce2c38355de9f434838d4507)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-sorting/natural.js
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Varun Saxena
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, 
> YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, 
> YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-03 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039613#comment-15039613
 ] 

Jun Gong commented on YARN-3480:


[~jianhe] thanks for the remind. I thought the final solution is "we only have 
(limits + asynchronous recovery) for services, once YARN-1039 goes in", so I am 
waiting for YARN-1039. 

However what you just suggested is reasonable too, it depends on how important 
we think apps history information is. We have already implemented it and it 
works well in our cluster, I could port it to trunk. I will attach a patch 
against trunk code later.

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3201) add args for DistributedShell to specify a image for tasks that will run on docker

2015-12-03 Thread yarntime (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yarntime reassigned YARN-3201:
--

Assignee: yarntime

> add args for DistributedShell to specify a image for tasks that will run on 
> docker
> --
>
> Key: YARN-3201
> URL: https://issues.apache.org/jira/browse/YARN-3201
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: applications/distributed-shell
>Reporter: zhangwei
>Assignee: yarntime
>
> It's very useful to execute a script on docker to do some test, but the 
> distributedshell has no args to set the image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039609#comment-15039609
 ] 

Hadoop QA commented on YARN-3669:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 326, now 327). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 36s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6

[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039597#comment-15039597
 ] 

Hong Zhiguo commented on YARN-4002:
---

I'm working on it. I've proposed 2 different solutions and waiting for specific 
comments.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039594#comment-15039594
 ] 

Hudson commented on YARN-4405:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8917 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8917/])
YARN-4405. Support node label store in non-appendable file system. (jianhe: rev 
755dda8dd8bb23864abc752bad506f223fcac010)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/DummyCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfigurationFieldsBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-03 Thread Sunil G (JIRA)
Sunil G created YARN-4418:
-

 Summary: AM Resource Limit per partition can be updated to 
ResourceUsage as well
 Key: YARN-4418
 URL: https://issues.apache.org/jira/browse/YARN-4418
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G


AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039563#comment-15039563
 ] 

Wangda Tan commented on YARN-4309:
--

Hi [~vvasudev],

Thanks for working on this task, it's really useful to identify container 
launch issues, some questions/comments:
- Since debug information fetch script (like copy script and list files) is at 
the end of launch_container.sh, is it possible that a container is killed so 
such script cannot be executed?
- Do you think is it better to generate a separated script file to fetch debug 
information before launch user code? Which we can 
1. Guarantee it will be executed
2. It won't add debug information to normal launch_container.sh.
3. Return code of script won't affected by debug script.
- Is it possible to enable/disable this function while NM is running? 

+[~sidharta-s].

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039560#comment-15039560
 ] 

Jian He commented on YARN-3480:
---

[~hex108], i know it's been a long time, would you still like to work on this  ?
IMO, as a first step, we can do as in previous 
[comment|https://issues.apache.org/jira/browse/YARN-3480?focusedCommentId=14533731&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14533731]
 to remove the apps beyond the validity interval as mostly those apps user care 
least. cc [~xgong]. 

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039551#comment-15039551
 ] 

Wangda Tan commented on YARN-4304:
--

[~sunilg], 
I'm fine with changing ResourceUsage in a separated JIRA (logic change only, 
not renaming), but I think it's better to finish ResourceUsage change before 
this patch. Which we can have a more clear view of this patch once 
ResourceUsage changes are completed.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039550#comment-15039550
 ] 

Sunil G commented on YARN-4405:
---

Thanks [~leftnoteasy]. Latest patch looks good.
 +1.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039547#comment-15039547
 ] 

Sunil G commented on YARN-4304:
---

Thanks [~leftnoteasy] for the comments. 
Yes. It's fine and we can change those variables as per the comments. 
For {{ResourceUsage}},  since it's an existing code,  we need to add these new 
items and make the renaming as suggested. Both these can be tracked in another 
ticket together. And later can be made used here. I ll create another ticket 
for that if you feel it's fine. Thank you. 

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039539#comment-15039539
 ] 

Wangda Tan commented on YARN-4405:
--

[~sunilg], please let me know if you have any other comments on latest patch.

Thanks,

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3368) Improve YARN web UI

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039537#comment-15039537
 ] 

Wangda Tan commented on YARN-3368:
--

This task depends on some fixes of REST APIs, which is tracked by YARN-4417.

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) 
> yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-12-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: (POC, Aug-2015)) yarn-ui-screenshots.zip

Archived and reattached old screenshots

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) 
> yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-12-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: (was: Queue-Hierarchy-Screenshot.png)

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-12-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: (was: Applications-table-Screenshot.png)

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-12-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: (Dec 3 2015) yarn-ui-screenshots.zip

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039534#comment-15039534
 ] 

Sangjin Lee commented on YARN-4238:
---

[~varun_saxena], it seems like it should be relatively easy to correct the 2 
checkstyle violations. Could you also take a look at the unit test failures, 
especially TestSystemMetricsPublisher to see if it is related? Thanks!

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.02.patch, YARN-4238-feature-YARN-2928.03.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3368) Improve YARN web UI

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039532#comment-15039532
 ] 

Wangda Tan commented on YARN-3368:
--

Hi guys,

Some updates of the new YARN web UI works:
- Rewrote the UI framework (Still using Ember.JS) so new pages/charts can be 
added easier.
- Finished scheduler page, which contains a select-able queue hierarchy view 
and also related scheduler information.
- Finished apps table, app page, app-attempt page and also timeline view of 
app-attempts/containers.
- Finished cluster overview page, which contains several charts to show 
overview of the cluster (such as total memory, total apps, etc.)

Attached screenshots to demostrate latest changes.

There're still many pending tasks
- Finalize design.
- Page of node managers' view.
- Bugs / hardcoded configurations.

In order to advance the work faster, I propose to create a new branch 
(YARN-3368), so more people can participate development/discussion. All 
UI-related code changes will be in a separated folder 
"hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui".

Thanks for [~Sreenath], [~vinodkv], [~jianhe], [~gtCarrera] for supports and 
suggestions.

Thoughts?

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-12-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: (was: YARN-3368.poc.1.patch)

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039529#comment-15039529
 ] 

Hadoop QA commented on YARN-4311:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s 
{color} | {color:red} Patch generated 1 new checkstyle issues in root (total 
was 396, now 396). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 37s 
{color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color

[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-12-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039527#comment-15039527
 ] 

Sunil G commented on YARN-4292:
---

Thank you very much [~leftnoteasy]  for the review and commit. 

> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, 
> 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039522#comment-15039522
 ] 

Hadoop QA commented on YARN-4238:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
12s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 54s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 31s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
48s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 28s 
{color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new 
issues (was 779, now 779). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 54s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 24m 22s 
{color} | {color:red} root-jdk1.7.0_85 with JDK v1.7.0_85 generated 2 new 
issues (was 772, now 772). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 54s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s 
{color} | {color:red} Patch generated 2 new checkstyle issues in root (total 
was 401, now 339). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
31s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 50s {color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 0s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch pa

[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2015-12-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039519#comment-15039519
 ] 

Sidharta Seethana commented on YARN-3542:
-

hi [~vvasudev],

Thank you for the patch. I took a look at the patch and it is a bit unclear how 
the new configs/resource handler are meant to interact with the existing 
{{CgroupsLCEResourcesHandler}} . IMO, one of the goals here is to deprecate 
{{CgroupsLCEResourcesHandler}} and use the new resource handler mechanism so 
that all the resource handling/isolation is handled in a consistent manner. 

Could you please provide a description of the changes introduced in this patch 
and what the interaction would be with the existing CPU cgroups implementation 
(especially from a configuration perspective) ?

thanks,
-Sidharta

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2015-12-03 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana reassigned YARN-3542:
---

Assignee: Sidharta Seethana  (was: Varun Vasudev)

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039502#comment-15039502
 ] 

Jian He commented on YARN-4403:
---

patch looks good to me. 
one minor suggestion, do you think we can change the base 
AbstractLivelinessMonitor to have a default constructor with   MonotonicClock 
and callers can use this constructor instead ?

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit

2015-12-03 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3669:
--
Assignee: Xuan Gong  (was: Vinod Kumar Vavilapalli)

> Attempt-failures validatiy interval should have a global admin configurable 
> lower limit
> ---
>
> Key: YARN-3669
> URL: https://issues.apache.org/jira/browse/YARN-3669
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Xuan Gong
>  Labels: newbie
> Attachments: YARN-3669.1.patch
>
>
> Found this while reviewing YARN-3480.
> bq. When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.
> I think we need to have a lower limit on the failure-validaty interval to 
> avoid situations like this.
> Having this will avoid pardoning too-many failures in too-short a duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4417) Make RM and Timeline-server REST APIs more consistent

2015-12-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4417:


 Summary: Make RM and Timeline-server REST APIs more consistent
 Key: YARN-4417
 URL: https://issues.apache.org/jira/browse/YARN-4417
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan


There're some differences between RM and timeline-server's REST APIs, for 
example, RM REST API doesn't support get application attempt info by app-id and 
attempt-id but timeline server supports. We could make them more consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit

2015-12-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3669:

Attachment: YARN-3669.1.patch

> Attempt-failures validatiy interval should have a global admin configurable 
> lower limit
> ---
>
> Key: YARN-3669
> URL: https://issues.apache.org/jira/browse/YARN-3669
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>  Labels: newbie
> Attachments: YARN-3669.1.patch
>
>
> Found this while reviewing YARN-3480.
> bq. When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.
> I think we need to have a lower limit on the failure-validaty interval to 
> avoid situations like this.
> Having this will avoid pardoning too-many failures in too-short a duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4417) Make RM and Timeline-server REST APIs more consistent

2015-12-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4417:
-
Attachment: YARN-4417.1.patch

Attached initial patch for review.

> Make RM and Timeline-server REST APIs more consistent
> -
>
> Key: YARN-4417
> URL: https://issues.apache.org/jira/browse/YARN-4417
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4417.1.patch
>
>
> There're some differences between RM and timeline-server's REST APIs, for 
> example, RM REST API doesn't support get application attempt info by app-id 
> and attempt-id but timeline server supports. We could make them more 
> consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: YARN-4340.v5.patch

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id, username".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: (was: YARN-4340.v8.patch)

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id, username".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-03 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: YARN-4340.v8.patch

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id, username".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038803#comment-15038803
 ] 

Hadoop QA commented on YARN-1974:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 (total was 148, now 148). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 21s {color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 8s 
{color} | {color:green} hadoop-yarn-applications-distributedshell in the patch 
passed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 54s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775669/YARN-1974.1.patch |
| JIRA Issue | YARN-1974 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 613c1f2f9c94 3.13.0-36-lowlatency 

[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038769#comment-15038769
 ] 

Wangda Tan commented on YARN-2885:
--

[~kkaranasos], 
bq. ... That said, I am not sure if it is required to create a wrapper at this 
point for the AM protocol.
As suggested by [~asuresh], 
bq. Have an Distributed Scheduler AM Service running on the RM if DS is 
enabled. This will implement the new protocol (it will delegate all the 
AMProtocol stuff to the AMService and will handle DistScheduler specific stuff)
Do you think if it's a good idea?

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038763#comment-15038763
 ] 

Hadoop QA commented on YARN-4405:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
24s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 12s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s 
{color} | {color:red} Patch generated 7 new checkstyle issues in root (total 
was 264, now 267). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
57s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 13 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 45s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 49s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:gr

[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038768#comment-15038768
 ] 

Wangda Tan commented on YARN-2885:
--

Hi [~asuresh],
Thanks for reply,

bq. What we were aiming for is to not send any Queueable resource reqs to the 
RM...
After thinking, RM could directly support queue-able container allocation. 
Since queue-able/guaranteed executionType is part of user-facing API, so 
scheduler can consider to allocate queue-able container or not. LocalRM is a 
way to allocate queue-able containers. But  please make sure that there's no 
assumption (hardcoded logic) that queue-able container can be only allocated by 
LocalRM?

bq. I totally agree that the AM should not be bothered with this.. But if you 
notice, It is actually not set by the AM, it set by the 
DistSchedulerReqeustInterceptor when it proxies the AM calls...
Since you planned to have a LocalRM coordinator, I would prefer to add a 
separated Distributed Scheduler Coordinator service and protocols.

Other comments are make sense to me.


> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038760#comment-15038760
 ] 

Wangda Tan commented on YARN-2877:
--

Hi [~kkaranasos],
Thanks for reply:
bq. We are planning to address this by having smaller heartbeat intervals in 
the AM-LocalRM communication when compared to the LocalRM-RM. For instance, the 
AM-LocalRM heartbeat interval can be set to 50ms, while the LocalRM-RM interval 
to 200ms (in other words, we will only propagate to the RM only one in every 
four heartbeats).
Maybe you could also take a look at HADOOP-11552, which could possibly achieve 
better latency and reduce heartbeat frequency.

bq. This is a valid concern. The best way to minimize preemption is through the 
"top-k node list" technique described above. As the LocalRM will be placing the 
QUEUEABLE containers to the least loaded nodes, preemption will be minimized.
I think top-k node list technique cannot completely solve the over subscribe 
issue, in a production cluster, application comes in waves, it is possible that 
few large applications can exhaust all resources in a cluster within few 
seconds. Maybe another possible approach to mitigate the issue is: propagating 
queue-able containers from NM to RM periodically, so NM can still make decision 
but RM can also be aware of these queue-able containers.

bq. That said, as you also mention, QUEUEABLE containers are more suitable for 
short-running tasks, where the probability of a container being preempted is 
smaller.
Ideally it's better to support all non-long-running-service tasks. LocalRM 
could allocate short-running queue-able tasks and RM an allocate other 
queue-able tasks.

> Extend YARN to support distributed scheduling
> -
>
> Key: YARN-2877
> URL: https://issues.apache.org/jira/browse/YARN-2877
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>Assignee: Konstantinos Karanasos
> Attachments: distributed-scheduling-design-doc_v1.pdf
>
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038743#comment-15038743
 ] 

Hudson commented on YARN-4292:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8915 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8915/])
YARN-4292. ResourceUtilization should be a part of NodeInfo REST API. (wangda: 
rev a2c3bfc8c1349102a7f2bc4ea96b80b429ac227b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceUtilizationInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* hadoop-yarn-project/CHANGES.txt


> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, 
> 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2015-12-03 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4311:
--
Attachment: YARN-4311-v3.patch

Fixed one test failure in TestYarnConfigurationFields by adding the new configs 
to yarn-default. TestAMAuthorization and TestClientRMTokens are unrelated and 
fail as per YARN-4318 and YARN-4306.

Corrected check-style issues.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run

2015-12-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1974:
---

Assignee: Xuan Gong  (was: Hong Zhiguo)

> add args for DistributedShell to specify a set of nodes on which the tasks run
> --
>
> Key: YARN-1974
> URL: https://issues.apache.org/jira/browse/YARN-1974
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Affects Versions: 2.7.0
>Reporter: Hong Zhiguo
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: YARN-1974.1.patch, YARN-1974.patch
>
>
> It's very useful to execute a script on a specific set of machines for both 
> testing and maintenance purpose.
> The args "--nodes" and "--relax_locality" are added to DistributedShell. 
> Together with an unit test using miniCluster.
> It's also tested on our real cluster with Fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run

2015-12-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1974:

Attachment: YARN-1974.1.patch

> add args for DistributedShell to specify a set of nodes on which the tasks run
> --
>
> Key: YARN-1974
> URL: https://issues.apache.org/jira/browse/YARN-1974
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Affects Versions: 2.7.0
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1974.1.patch, YARN-1974.patch
>
>
> It's very useful to execute a script on a specific set of machines for both 
> testing and maintenance purpose.
> The args "--nodes" and "--relax_locality" are added to DistributedShell. 
> Together with an unit test using miniCluster.
> It's also tested on our real cluster with Fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038723#comment-15038723
 ] 

Wangda Tan commented on YARN-4225:
--

Thanks [~jlowe], I can understand the issue now.

I'm OK with both approach - existing one in latest patch or simply return false 
if there's no such field in proto.

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038703#comment-15038703
 ] 

Wangda Tan commented on YARN-4304:
--

Hi [~sunilg],
Took a look at REST API implementation of latest patch, some comments:

By design, PartitionResourcesInfo/ResourcesInfo should be used by 
user/queue/app, so we need to make fields and usage generic to these components.
- amResourceLimit is meaningful to all components. App doesn't use that field 
for now, but we can keep it and set it to infinite.
- userAMResourceLimit is not meaningful to queue/app, and it's overlap to 
user.resourcesInfo.amResourceLimit, I suggest to remove it and we can use 
amResourceLimit of first user of queues to show on UI. Another reason is in the 
future we could have different users has different amResourceLimits.

And also ResourcesInfo is RESTful mapping to ResourceUsage, so necessary 
changes need to be added to ResourceUsage (Maybe rename ResourceUsage to 
ResourcesInformation?) as well. Renaming could be done in a separated JIRA, but 
I suggest to change ResourceUsage implementation at the same JIRA.

If you agree with above, ResourcesInfo's constructor shouldn't relate to 
LeafQueue and considerAMUsage, it should simply copy fields from ResourceUsage.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4408) NodeManager still reports negative running containers

2015-12-03 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038649#comment-15038649
 ] 

Robert Kanter commented on YARN-4408:
-

Thanks [~djp]!

> NodeManager still reports negative running containers
> -
>
> Key: YARN-4408
> URL: https://issues.apache.org/jira/browse/YARN-4408
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: YARN-4408.001.patch, YARN-4408.002.patch, 
> YARN-4408.003.patch
>
>
> YARN-1697 fixed a problem where the NodeManager metrics could report a 
> negative number of running containers.  However, it missed a rare case where 
> this can still happen.
> YARN-1697 added a flag to indicate if the container was actually launched 
> ({{LOCALIZED}} to {{RUNNING}}) or not ({{LOCALIZED}} to {{KILLING}}), which 
> is then checked when transitioning from {{CONTAINER_CLEANEDUP_AFTER_KILL}} to 
> {{DONE}} and {{EXITED_WITH_FAILURE}} to {{DONE}} to only decrement the gauge 
> if we actually ran the container and incremented the gauge .  However, this 
> flag is not checked while transitioning from {{EXITED_WITH_SUCCESS}} to 
> {{DONE}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038582#comment-15038582
 ] 

Hudson commented on YARN-3840:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8914 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8914/])
YARN-3840. Resource Manager web ui issue when sorting application by id 
(jianhe: rev 9f77ccad735f4843ce2c38355de9f434838d4507)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-sorting/natural.js
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Varun Saxena
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, 
> YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, 
> YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-03 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4238:
---
Attachment: YARN-4238-feature-YARN-2928.03.patch

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.02.patch, YARN-4238-feature-YARN-2928.03.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038566#comment-15038566
 ] 

Varun Saxena commented on YARN-4238:


[~sjlee0], sorry had missed your comment. Will fix checkstyle issues and update 
a patch shortly .

bq. I think it is reasonable to say that the clients are required to set 
creation time and modification time, or they will not be present in the data 
and things like sort will not work correctly on those records. What do you 
think?
I agree. Client sending created time is a reasonable assumption to make. We can 
mention this explicitly as well when we document ATSv2.


> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038560#comment-15038560
 ] 

Wangda Tan commented on YARN-4416:
--

Thanks for reporting this issue, [~Naganarasimha].

Looked at the code, all methods used by 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue#toString
 don't need to be synchronized:
- queueCapacity, resource-usage has their own read/write lock.
- numContainers is volatile.
- read/write lock could be added to OrderingPolicy. Read operations don't need 
synchronized. So getNumApplications doesn't need synchronized.

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Brook Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038392#comment-15038392
 ] 

Brook Zhou commented on YARN-4002:
--

If this is currently not being worked on, I will assign it to me.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038381#comment-15038381
 ] 

Naganarasimha G R commented on YARN-3946:
-

Hi [~wangda],
Some of the test failures seems to be related to the patch also would merge 
{{checkAndUpdateAMContainerDiagnostics}} and {{updateAMContainerDiagnostics}} 
with additional parameter rather than another method. Will upload a new patch 
at the earliest. 

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4416:

Attachment: deadlock.log

attaching the stack trace for deadlock

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-03 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4416:
---

 Summary: Deadlock due to synchronised get Methods in 
AbstractCSQueue
 Key: YARN-4416
 URL: https://issues.apache.org/jira/browse/YARN-4416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, resourcemanager
Affects Versions: 2.7.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor


While debugging in eclipse came across a scenario where in i had to get to know 
the name of the queue but every time i tried to see the queue it was getting 
hung. On seeing the stack realized there was a deadlock but on analysis found 
out that it was only due to *queue.toString()* during debugging as 
{{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized and 
better be handled through read and write locks.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038332#comment-15038332
 ] 

Naganarasimha G R commented on YARN-4415:
-

As per offline discussion with [~wangda] he had mentioned that it was done with 
intent that the default max capacity of a  partition is set to zero to avoid 
configuring the queue.
IMHO i feel its much easier if we assume max capacity is 100% and calculate abs 
max based on its parent queue's max cap for following reasons
# It will have the same behavior as that of default partition hence less 
confusion
# May be my understanding is wrong but i feel its easier to add new partitions 
without touching the CS.xml as we can set the accessible nodelabels to * and 
assume 100% as the max capacity and 0% as guranteed capacity.

And also we need to update the documentation with the default values


> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4415:

Attachment: screenshot-1.png

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover

2015-12-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038294#comment-15038294
 ] 

Jason Lowe commented on YARN-4392:
--

+1 for the latest patch, if we go with re-sending of events upon recovery.

I think re-sending of events is "safer" assuming the redundant events are 
handled properly.  That way if we missed an event we will fill that gap upon 
recovery.  There is the concern of extra load it generates on the RM and ATS 
during recovery.  Note that we probably will miss ATS events upon recovery in 
some scenarios if we don't re-send since ATS event posting is async and state 
store updating are async.  There's a race where we could update the state store 
and crash before the ATS event is sent.

> ApplicationCreatedEvent event time resets after RM restart/failover
> ---
>
> Key: YARN-4392
> URL: https://issues.apache.org/jira/browse/YARN-4392
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch, 
> YARN-4392.2.patch
>
>
> {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - 
> Finished time 1437453994768 is ahead of started time 1440308399674 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437454008244 is ahead of started time 1440308399676 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444305171 is ahead of started time 1440308399653 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444293115 is ahead of started time 1440308399647 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444379645 is ahead of started time 1440308399656 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444361234 is ahead of started time 1440308399655 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444342029 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444323447 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143730006 is ahead of started time 1440308399660 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143715698 is ahead of started time 1440308399659 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143719060 is ahead of started time 1440308399658 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444393931 is ahead of started time 1440308399657
> {code} . 
> From ATS logs, we would see a large amount of 'stale alerts' messages 
> periodically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-12-03 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038283#comment-15038283
 ] 

Kuhu Shukla commented on YARN-4386:
---

[~sunilg], [~djp]. Request for comments on how to test this since even during 
the transition, the RMNode is removed from active list first and then put in 
the inactive RMNode list. Unless there are 2 refreshNodes done in parallel such 
that the first deactivateNodeTransition has not finished and the other 
refreshNodes is also trying to do the same transition, only one of them would 
succeed and this would not be a race (?). Let me know if that makes sense.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038282#comment-15038282
 ] 

Wangda Tan commented on YARN-4405:
--

Thanks [~sunilg].

Addressed #2/#3. For #1, I was using 
{code}
fs.create(newTmpPath, true);
{code}
So old tmp file will be always overwritten by the new one.

Attaching ver.4 patch.


> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4405) Support node label store in non-appendable file system

2015-12-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4405:
-
Attachment: YARN-4405.4.patch

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-03 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038269#comment-15038269
 ] 

Kuhu Shukla commented on YARN-4413:
---

The current patch for YARN-4386*

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-03 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038265#comment-15038265
 ] 

Kuhu Shukla commented on YARN-4413:
---

YARN-4386 tracks the RECOMMISSION check. The current patch does not have a test 
since its an invalid check.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-03 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4413:
---
Attachment: YARN-4413.001.patch

Here's the approach I propose.

I should note that I noticed that the graceful decommission code allows an 
illegal state transition:

{code}
if (entry.getValue().getState() == NodeState.DECOMMISSIONING
|| entry.getValue().getState() == NodeState.DECOMMISSIONED) {
  this.rmContext.getDispatcher().getEventHandler()
  .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
}
{code}

DECOMMISSIONED -> RECOMMISSION is not allowed.  This issue is coincidentally 
fixed by this patch.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4415:

Attachment: App info with diagnostics info.png

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-03 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4415:
---

 Summary: Scheduler Web Ui shows max capacity for the queue is 100% 
but when we submit application doesnt get assigned
 Key: YARN-4415
 URL: https://issues.apache.org/jira/browse/YARN-4415
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler, resourcemanager
Affects Versions: 2.7.2
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R



Steps to reproduce the issue :
Scenario 1:
# Configure a queue(default) with accessible node labels as *
# create a exclusive partition *xxx* and map a NM to it
# ensure no capacities are configured for default for label xxx
# start an RM app with queue as default and label as xxx
# application is stuck but scheduler ui shows 100% as max capacity for that 
queue

Scenario 2:
# create a nonexclusive partition *sharedPartition* and map a NM to it
# ensure no capacities are configured for default queue
# start an RM app with queue as *default* and label as *sharedPartition*
# application is stuck but scheduler ui shows 100% as max capacity for that 
queue for *sharedPartition*

For both issues cause is the same default max capacity and abs max capacity is 
set to Zero %




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038231#comment-15038231
 ] 

Sidharta Seethana commented on YARN-4309:
-

Could you please edit the comment mentioned above to make it a bit clearer? 
Maybe include a note, both in yarn-default.xml 's description of the config 
flag as well as code that symlinks could be followed outside the current 
directory?

Also, there seem to be a few spurious empty lines introduced in 
DockerContainerExecutor. Apart from this, the latest patch seems good to me.



> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038195#comment-15038195
 ] 

Jason Lowe commented on YARN-4225:
--

With deployments of multiple clusters it's easy to get into situations where 
newer clients end up talking to older clusters.  Clusters are rarely upgraded 
at the same time, and remote access of one cluster for a job's input is not 
rare at least in our setups.  That's a perfect example of a newer client 
talking to an older server.  It may not be officially supported, but it's not 
going to be a rare occurrence (for HDFS, at least).

For this specific case I think the need for the feature would be rare, hence my 
hedging about whether we really need it.  It's not that the new client 
completely breaks talking to the older server, it just would be capable of 
returning misleading information about the preemption status.  I'm OK if we 
decide this scenario isn't worth supporting.  (It's not hard to do so, just 
tedious and a bit messy with the API.)  But in general there will be cases 
where breaking compatibility between a newer client and an older server is 
going to be problematic even if it isn't officially supported because of the 
multiple cluster scenarios.

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038169#comment-15038169
 ] 

Hadoop QA commented on YARN-3816:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
24s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
53s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
feature-YARN-2928 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 10s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} Patch generated 18 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 362, now 367). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
15s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 34s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 
generated 2 new issues (was 100, now 100). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-common in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 35s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 20s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 51s 
{color}

[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-03 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038135#comment-15038135
 ] 

Daniel Templeton commented on YARN-4413:


bq. But a restart will help here to clear the metrics.

True, but it will also cause an outage, which comes with its own potential 
impact.

bq. So I feel we could look both lists upon refresh and remove/add nodes based 
on the entries in both files and from memory.

Agreed.  I'll past a patch with my general approach shortly.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038107#comment-15038107
 ] 

Jason Lowe commented on YARN-4414:
--

I noticed that HA proxies for the namenode and resourcemanager explicitly 
disable the connection retries in the RPC layer by default since it knows the 
HA proxy will do the retries.  I think the same should apply for nodemanager 
proxies, since we're seeing even connection timeouts retried too often in the 
RPC layer given a container allocation is worthless after 10 minutes by 
default.  By disabling retries in the RPC layer, we can add 
ConnectTimeoutException back to the list of exceptions retried at the NM proxy 
layer and simply retry all appropriate exceptions at the NM proxy layer.

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-03 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4414:


 Summary: Nodemanager connection errors are retried at multiple 
levels
 Key: YARN-4414
 URL: https://issues.apache.org/jira/browse/YARN-4414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.2, 2.7.1
Reporter: Jason Lowe


This is related to YARN-3238.  Ran into more scenarios where connection errors 
are being retried at multiple levels, like NoRouteToHostException.  The fix for 
YARN-3238 was too specific, and I think we need a more general solution to 
catch a wider array of connection errors that can occur to avoid retrying them 
both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >