[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964735#comment-14964735
 ] 

Sunil G commented on YARN-4041:
---

Test case failures looks related,  I will debug and will check. 

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4272) CLI MiniCluster fails to launch MiniYARNCluster due to NoClassDefFoundError

2015-10-20 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved YARN-4272.

Resolution: Duplicate

YARN-429 moved test-jar of hadoop-yarn-server-tests in order to exclude it from 
classpath. HADOOP-9891 updated the documentation to make user add the jar to 
classpath explicitly.

I'm closing this as duplicate of HADOOP-9891.

> CLI MiniCluster fails to launch MiniYARNCluster due to NoClassDefFoundError
> ---
>
> Key: YARN-4272
> URL: https://issues.apache.org/jira/browse/YARN-4272
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>
> [CLI 
> MiniCluster|https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/CLIMiniCluster.html]
>  fails due to NoClassDefFoundError because the 
> hadoop-yarn-server-tests-*-SNAPSHOT-tests.jar containing MiniYARNCluster is 
> not included in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4275) Rest API to expose YARN CLASSPATH and other useful system properties

2015-10-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964816#comment-14964816
 ] 

Steve Loughran commented on YARN-4275:
--

This is a duplicate of YARN-1565 —please, always do a search of JIRA before 
filing new patches. Not just for the history, but because the older ones have 
more watchers.

Being the later patch, I'm closing this one, then assigning the older one to 
you. Submit your patch there and we'll start the review process.

> Rest API to expose YARN CLASSPATH and other useful system properties
> 
>
> Key: YARN-4275
> URL: https://issues.apache.org/jira/browse/YARN-4275
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Pradeep Subrahmanion
> Attachments: YARN-4275-001.patch, YARN-4275-002.patch
>
>
> Currently to build AM request, the application client need to know about the 
> YARN CLASSPATH. Application client has no way to fetch the default CLASSPATH 
> of YARN or yarn.application.classpath property in yarn-site.xml through REST 
> API.
> Introduce a new REST API to fetch yarn CLASSPATH and other system properties 
> that may be useful for those who implement REST client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4275) Rest API to expose YARN CLASSPATH and other useful system properties

2015-10-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964819#comment-14964819
 ] 

Steve Loughran commented on YARN-4275:
--

...sorry, can't assign YARN-1565, I don't have the perms to add you to the 
right group. Just submit your patch there anyway

> Rest API to expose YARN CLASSPATH and other useful system properties
> 
>
> Key: YARN-4275
> URL: https://issues.apache.org/jira/browse/YARN-4275
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Pradeep Subrahmanion
> Attachments: YARN-4275-001.patch, YARN-4275-002.patch
>
>
> Currently to build AM request, the application client need to know about the 
> YARN CLASSPATH. Application client has no way to fetch the default CLASSPATH 
> of YARN or yarn.application.classpath property in yarn-site.xml through REST 
> API.
> Introduce a new REST API to fetch yarn CLASSPATH and other system properties 
> that may be useful for those who implement REST client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4270) Limit application resource reservation on nodes for non-node/rack specific requests

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964859#comment-14964859
 ] 

Hudson commented on YARN-4270:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2454 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2454/])
YARN-4270. Limit application resource reservation on nodes for (arun suresh: 
rev 7e2837f830382835838c82398db6fc9823d612a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java


> Limit application resource reservation on nodes for non-node/rack specific 
> requests
> ---
>
> Key: YARN-4270
> URL: https://issues.apache.org/jira/browse/YARN-4270
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Fix For: 2.8.0
>
> Attachments: YARN-4270.1.patch, YARN-4270.2.patch, YARN-4270.3.patch, 
> YARN-4270.4.patch, YARN-4270.5.patch
>
>
> I has been noticed that for off-switch requests, the FairScheduler reserves 
> resources on all nodes. This could lead to the entire cluster being 
> unavailable for all other applications.
> Ideally, the reservations should be on a configurable number of nodes, 
> default 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4221) Store user in app to flow table

2015-10-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964882#comment-14964882
 ] 

Varun Saxena commented on YARN-4221:


Ok, will rebase.

> Store user in app to flow table
> ---
>
> Key: YARN-4221
> URL: https://issues.apache.org/jira/browse/YARN-4221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4221-YARN-2928.01.patch, 
> YARN-4221-YARN-2928.02.patch
>
>
> We should store user as well in in app to flow table.
> For queries where user is not supplied and flow context can be retrieved from 
> app to flow table, we should take the user from app to flow table instead of 
> considering UGI as default user.
> This is as per discussion on YARN-3864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4270) Limit application resource reservation on nodes for non-node/rack specific requests

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964931#comment-14964931
 ] 

Hudson commented on YARN-4270:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #517 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/517/])
YARN-4270. Limit application resource reservation on nodes for (arun suresh: 
rev 7e2837f830382835838c82398db6fc9823d612a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> Limit application resource reservation on nodes for non-node/rack specific 
> requests
> ---
>
> Key: YARN-4270
> URL: https://issues.apache.org/jira/browse/YARN-4270
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Fix For: 2.8.0
>
> Attachments: YARN-4270.1.patch, YARN-4270.2.patch, YARN-4270.3.patch, 
> YARN-4270.4.patch, YARN-4270.5.patch
>
>
> I has been noticed that for off-switch requests, the FairScheduler reserves 
> resources on all nodes. This could lead to the entire cluster being 
> unavailable for all other applications.
> Ideally, the reservations should be on a configurable number of nodes, 
> default 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-10-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964965#comment-14964965
 ] 

Rohith Sharma K S commented on YARN-2729:
-

bq. Hence i am reverting the fix for the comment which was given by Rohith 
Sharma K S. IIUC there are no impacts if we do super.serviceStop though 
theoritically its not the right way to do . Thoughts ?
The reason why I suggested in my earlier comment i.e {{In serviceStop in 
ScriptBasedNodeLabelsProvider , move super.serviceStop(); at end of 
serviceStop() so first it stops shexec and stopping parent.}} is Consider NM 
has services A,B,C,D which are started in the order of ABCD. When stopping, the 
services are stopped in reverse order i.e D,C,B,A. Say, D has child class D1. 
If you call super.serviceStop() first, D1 clean up will not happen first 
rather, D1(D,C,B,A)-cleanup D1 which is at the end. 

Better way to avoid all conflicts is create abstract method say cleanUp() in 
AbstractNodeLabelProvider. This method can be override in 
ScriptBasedNodeLabelsProvider. A sample change is like below
{code}
  @Override
  protected void serviceStop() throws Exception {
if (nodeLabelsScheduler != null) {
  nodeLabelsScheduler.cancel();
}
cleanUp();
super.serviceStop();
  }
{code}

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch, 
> YARN-2729.20150925-1.patch, YARN-2729.20151015-1.patch, 
> YARN-2729.20151019.patch, YARN-2729.20151310-1.patch, 
> YARN-2729.20151310-2.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 
> Miscellaneous Issues :
> # In configurationNodeLabelsProvider instead of taking multiple labels from 
> single configuration, we need to support exclusive configuration for 
> partition (single label).
> # Proper logging when registration of Node Fails
> # Classloader was not getting reset from custom class loader in 
> TestConfigurationNodeLabelsProvider.java which could make test cases fail in 
> certain conditions
> # In ResourceTrackerService we need to consider distributed configuration 
> only when node labels are enabled. leads to lots of logs in certain conditions
> # NodeLabelsProvider needs to be a interface rather than  abstract class 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-20 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4041:
--
Attachment: 0003-YARN-4041.patch

Updating patch after test case fix.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-10-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964999#comment-14964999
 ] 

Sunil G commented on YARN-2729:
---

Yes [~rohithsharma].
With this, child class will have the capability to clean up stuffs which can be 
done after time is cancelled. 
So {{shExec.destroy()}} can be called from {{cleanUp()}} which is 
implemented/overridden in {{ScriptBasedNodeLabelsProvider}}

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch, 
> YARN-2729.20150925-1.patch, YARN-2729.20151015-1.patch, 
> YARN-2729.20151019.patch, YARN-2729.20151310-1.patch, 
> YARN-2729.20151310-2.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 
> Miscellaneous Issues :
> # In configurationNodeLabelsProvider instead of taking multiple labels from 
> single configuration, we need to support exclusive configuration for 
> partition (single label).
> # Proper logging when registration of Node Fails
> # Classloader was not getting reset from custom class loader in 
> TestConfigurationNodeLabelsProvider.java which could make test cases fail in 
> certain conditions
> # In ResourceTrackerService we need to consider distributed configuration 
> only when node labels are enabled. leads to lots of logs in certain conditions
> # NodeLabelsProvider needs to be a interface rather than  abstract class 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type

2015-10-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965001#comment-14965001
 ] 

Sunil G commented on YARN-4164:
---

Hi [~rohithsharma]
Thanks for updating patch. 

I have one suggestion here, {{UpdateApplicationPriorityResponse}} can now 
report back the updated priority. But user wont understand what exactly is 
this, as the name suggests only priority. So it can be more like 
{{succefullyUpdatedPriority}}, rather than keeping a success flag.
Also when we skip priority update like in cases mentioned in YARN-4141, we can 
set this return value as "null" or ""n/a", to indicate operation has not done. 
Thoughts?

> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4279) Mark ApplicationId and ApplicationAttemptId static methods as @Public, @Unstable

2015-10-20 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4279:


 Summary: Mark ApplicationId and ApplicationAttemptId static 
methods as @Public, @Unstable
 Key: YARN-4279
 URL: https://issues.apache.org/jira/browse/YARN-4279
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.7.1
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor


The classes {{ApplicationId}} and {{ApplicationAttemptId}} both have 
{{newInstance()}} methods tagged as {{@Private}}. Yet they are useful in 
testing, as the alternative is to create and configure the PBImpl classes 
-which are significantly more private.

The fact that mapreduce's {{MRBuilderUtils}} uses one of the methods shows that 
YARN apps do need access to the methods.

Marking them as public would make it clear that other YARN apps were using them 
for their production or test code, rather than today, where they are used and 
depended on, yet without the YARN team's knowledge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4279) Mark ApplicationId and ApplicationAttemptId static methods as @Public, @Unstable

2015-10-20 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4279:
-
Attachment: YARN-4279-001.patch

patch 001, changes markers on newInstance methods. no tests -untestable

> Mark ApplicationId and ApplicationAttemptId static methods as @Public, 
> @Unstable
> 
>
> Key: YARN-4279
> URL: https://issues.apache.org/jira/browse/YARN-4279
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-4279-001.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> The classes {{ApplicationId}} and {{ApplicationAttemptId}} both have 
> {{newInstance()}} methods tagged as {{@Private}}. Yet they are useful in 
> testing, as the alternative is to create and configure the PBImpl classes 
> -which are significantly more private.
> The fact that mapreduce's {{MRBuilderUtils}} uses one of the methods shows 
> that YARN apps do need access to the methods.
> Marking them as public would make it clear that other YARN apps were using 
> them for their production or test code, rather than today, where they are 
> used and depended on, yet without the YARN team's knowledge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965071#comment-14965071
 ] 

Hadoop QA commented on YARN-4041:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  57m 50s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 36s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767585/0003-YARN-4041.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9490/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9490/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9490/console |


This message was automatically generated.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4279) Mark ApplicationId and ApplicationAttemptId static methods as @Public, @Unstable

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965115#comment-14965115
 ] 

Hadoop QA commented on YARN-4279:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  7s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| | |  41m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767594/YARN-4279-001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9491/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9491/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9491/console |


This message was automatically generated.

> Mark ApplicationId and ApplicationAttemptId static methods as @Public, 
> @Unstable
> 
>
> Key: YARN-4279
> URL: https://issues.apache.org/jira/browse/YARN-4279
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-4279-001.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> The classes {{ApplicationId}} and {{ApplicationAttemptId}} both have 
> {{newInstance()}} methods tagged as {{@Private}}. Yet they are useful in 
> testing, as the alternative is to create and configure the PBImpl classes 
> -which are significantly more private.
> The fact that mapreduce's {{MRBuilderUtils}} uses one of the methods shows 
> that YARN apps do need access to the methods.
> Marking them as public would make it clear that other YARN apps were using 
> them for their production or test code, rather than today, where they are 
> used and depended on, yet without the YARN team's knowledge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-20 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965146#comment-14965146
 ] 

Jun Gong commented on YARN-4256:


[~zxu], could you please help review it?

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.7.2
>
> Attachments: YARN-4256.001.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2015-10-20 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-4280:
-

 Summary: CapacityScheduler reservations may not prevent indefinite 
postponement on a busy cluster
 Key: YARN-4280
 URL: https://issues.apache.org/jira/browse/YARN-4280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 2.7.1, 2.6.1, 2.8.0
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Consider the following scenario:

There are 2 queues A(25% of the total capacity) and B(75%), both can run at 
total cluster capacity. There are 2 applications, appX that runs on Queue A, 
always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 
GB containers.
The user limit is high enough for the application to reach 100% of the cluster 
resource. 

appX is running at total cluster capacity, full with 1G containers releasing 
only one container at a time. appY comes in with a request of 2GB container but 
only 1 GB is free. Ideally, since appY is in the underserved queue, it has 
higher priority and should reserve for its 2 GB request. Since this request 
puts the alloc+reserve above total capacity of the cluster, reservation is not 
made. appX comes in with a 1GB request and since 1GB is still available, the 
request is allocated. 

This can continue indefinitely causing priority inversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-10-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965349#comment-14965349
 ] 

Sangjin Lee commented on YARN-3798:
---

[~ozawa], can it be verified and merged today? I am targeting tomorrow to cut 
the branch and create the release candidate for 2.6.2. Thanks!

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.6.02.patch, 
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
> YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, 
> YARN-3798-branch-2.7.006.patch, YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> 

[jira] [Updated] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM

2015-10-20 Thread Pradeep Subrahmanion (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Subrahmanion updated YARN-1565:
---
Attachment: YARN-1565-001.patch

This patch introduces a new REST API to fetch yarn system properties. Currently 
only CLASSPATH is returned through this API but can be extended  to return 
other YARN system properties that may be useful for REST clients.

I notice that my patch doesn't address all the points mentioned in this JIRA. 
But I am submitting this patch as per last update in YARN-4275.



> Add a way for YARN clients to get critical YARN system properties from the RM
> -
>
> Key: YARN-1565
> URL: https://issues.apache.org/jira/browse/YARN-1565
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
> Attachments: YARN-1565-001.patch
>
>
> If you are trying to build up an AM request, you need to know
> # the limits of memory, core &c for the chosen queue
> # the existing YARN classpath
> # the path separator for the target platform (so your classpath comes out 
> right)
> # cluster OS: in case you need some OS-specific changes
> The classpath can be in yarn-site.xml, but a remote client may not have that. 
> The site-xml file doesn't list Queue resource limits, cluster OS or the path 
> separator.
> A way to query the RM for these values would make it easier for YARN clients 
> to build up AM submissions with less guesswork and client-side config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Chang Li (JIRA)
Chang Li created YARN-4281:
--

 Summary: 2.7 RM app page is broken
 Key: YARN-4281
 URL: https://issues.apache.org/jira/browse/YARN-4281
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Fix For: 2.7.2


2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
the work around 2.7 patch of YARN-3544 to let it still use container report. 
Currently, our cluster's 2.7 RM app page is completely broken due to 500 error, 
which is caused by when user UGI is null, completed app can not retrieve its 
container report, and in that code path, it doesn't catch 
ContainerNotFoundException, but throw the exception, therefore cause the 500 
error.
 Running app is also broken because of the way it construct containerID by 
{code} "ContainerId.newContainerId(
  appAttemptReport.getApplicationAttemptId(), 1)" 
{code}, 
which will not include epoch number, so it will also get 
ContainerNotFoundException and throw 500 error.
But right now we can use the branch 2 patch for YARN-3544, instead of the work 
around 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4281:
---
Attachment: YARN-4281.2.7.modify.patch

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.2
>
> Attachments: YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number, so it will also get 
> ContainerNotFoundException and throw 500 error.
> But right now we can use the branch 2 patch for YARN-3544, instead of the 
> work around 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4281:
---
Description: 
2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
the work around 2.7 patch of YARN-3544 to let it still use container report. 
Currently, our cluster's 2.7 RM app page is completely broken due to 500 error, 
which is caused by when user UGI is null, completed app can not retrieve its 
container report, and in that code path, it doesn't catch 
ContainerNotFoundException, but throw the exception, therefore cause the 500 
error.
 Running app is also broken because of the way it construct containerID by 
{code} "ContainerId.newContainerId(
  appAttemptReport.getApplicationAttemptId(), 1)" 
{code}, 
which will not include epoch number, so it will also get 
ContainerNotFoundException and throw 500 error.
Propose to use the branch-2 version of YARN-3544, instead of the work around 
2.7 patch because branch 2 patch on 2.7 is no longer blocked.

  was:
2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
the work around 2.7 patch of YARN-3544 to let it still use container report. 
Currently, our cluster's 2.7 RM app page is completely broken due to 500 error, 
which is caused by when user UGI is null, completed app can not retrieve its 
container report, and in that code path, it doesn't catch 
ContainerNotFoundException, but throw the exception, therefore cause the 500 
error.
 Running app is also broken because of the way it construct containerID by 
{code} "ContainerId.newContainerId(
  appAttemptReport.getApplicationAttemptId(), 1)" 
{code}, 
which will not include epoch number, so it will also get 
ContainerNotFoundException and throw 500 error.
But right now we can use the branch 2 patch for YARN-3544, instead of the work 
around 2.7 patch because branch 2 patch on 2.7 is no longer blocked.


> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.2
>
> Attachments: YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965488#comment-14965488
 ] 

zhihai xu commented on YARN-4256:
-

Thanks for reporting this issue [~Prabhu Joseph]! Thanks for the patch 
[~hex108]! The patch looks most good. Can we change '+' to '*'
(\\.\\d+)? => (\\.\\d*)? So we can relax the condition to support 1024. mb.


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.7.2
>
> Attachments: YARN-4256.001.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965490#comment-14965490
 ] 

Hadoop QA commented on YARN-4281:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767633/YARN-4281.2.7.modify.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6381ddc |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9493/console |


This message was automatically generated.

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.2
>
> Attachments: YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4281:
---
Attachment: YARN-4281-branch-2.7.patch

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.2
>
> Attachments: YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965497#comment-14965497
 ] 

Hadoop QA commented on YARN-4281:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767639/YARN-4281-branch-2.7.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2 / 1a4bd5b |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9494/console |


This message was automatically generated.

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.2
>
> Attachments: YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4281:
---
Description: 
2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
the work around 2.7 patch of YARN-3544 to let it still use container report. 
Currently, our cluster's 2.7 RM app page is completely broken due to 500 error, 
which is caused by when user UGI is null, completed app can not retrieve its 
container report, and in that code path, it doesn't catch 
ContainerNotFoundException, but throw the exception, therefore cause the 500 
error.
 Running app is also broken because of the way it construct containerID by 
{code} "ContainerId.newContainerId(
  appAttemptReport.getApplicationAttemptId(), 1)" 
{code}, 
which will not include epoch number in ID, so it will also get 
ContainerNotFoundException and throw 500 error.
Propose to use the branch-2 version of YARN-3544, instead of the work around 
2.7 patch because branch 2 patch on 2.7 is no longer blocked.

  was:
2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
the work around 2.7 patch of YARN-3544 to let it still use container report. 
Currently, our cluster's 2.7 RM app page is completely broken due to 500 error, 
which is caused by when user UGI is null, completed app can not retrieve its 
container report, and in that code path, it doesn't catch 
ContainerNotFoundException, but throw the exception, therefore cause the 500 
error.
 Running app is also broken because of the way it construct containerID by 
{code} "ContainerId.newContainerId(
  appAttemptReport.getApplicationAttemptId(), 1)" 
{code}, 
which will not include epoch number, so it will also get 
ContainerNotFoundException and throw 500 error.
Propose to use the branch-2 version of YARN-3544, instead of the work around 
2.7 patch because branch 2 patch on 2.7 is no longer blocked.


> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.2
>
> Attachments: YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-10-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965516#comment-14965516
 ] 

Sangjin Lee commented on YARN-4129:
---

Thanks [~Naganarasimha] for the updated patch. Some comments:

(findbugs-exclude.xml)
- we should remove the obsolete exclude for AbstractTimelineServicePublisher

(AbstractSystemMetricsPublisher.java)
- l.137: I see we're using {{hashCode()}} in a specific way to ensure all 
events for the same app end up on the same thread. Still, I think we should 
also override {{equals()}} (maybe with appId + eventType) as a good practice.

(SystemMetricsPublisher.java)
- why are we removing the license header?

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129-YARN-2928.002.patch, 
> YARN-4129-YARN-2928.003.patch, YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965517#comment-14965517
 ] 

Hadoop QA commented on YARN-1565:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m  4s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 28s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m  6s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  62m 15s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 108m  3s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-api |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767621/YARN-1565-001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9492/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9492/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9492/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9492/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9492/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9492/console |


This message was automatically generated.

> Add a way for YARN clients to get critical YARN system properties from the RM
> -
>
> Key: YARN-1565
> URL: https://issues.apache.org/jira/browse/YARN-1565
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
> Attachments: YARN-1565-001.patch
>
>
> If you are trying to build up an AM request, you need to know
> # the limits of memory, core &c for the chosen queue
> # the existing YARN classpath
> # the path separator for the target platform (so your classpath comes out 
> right)
> # cluster OS: in case you need some OS-specific changes
> The classpath can be in yarn-site.xml, but a remote client may not have that. 
> The site-xml file doesn't list Queue resource limits, cluster OS or the path 
> separator.
> A way to query the RM for these values would make it easier for YARN clients 
> to build up AM submissions with less guesswork and client-side config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2015-10-20 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1496#comment-1496
 ] 

Kuhu Shukla commented on YARN-4280:
---

[~leftnoteasy], Request you to share your thoughts and comments on a possible 
fix.

> CapacityScheduler reservations may not prevent indefinite postponement on a 
> busy cluster
> 
>
> Key: YARN-4280
> URL: https://issues.apache.org/jira/browse/YARN-4280
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.6.1, 2.8.0, 2.7.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at 
> total cluster capacity. There are 2 applications, appX that runs on Queue A, 
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 
> GB containers.
> The user limit is high enough for the application to reach 100% of the 
> cluster resource. 
> appX is running at total cluster capacity, full with 1G containers releasing 
> only one container at a time. appY comes in with a request of 2GB container 
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it 
> has higher priority and should reserve for its 2 GB request. Since this 
> request puts the alloc+reserve above total capacity of the cluster, 
> reservation is not made. appX comes in with a 1GB request and since 1GB is 
> still available, the request is allocated. 
> This can continue indefinitely causing priority inversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM

2015-10-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965579#comment-14965579
 ] 

Steve Loughran commented on YARN-1565:
--

test failed as HTML came back.

One thing I'd recommend is that if an assert isn't met (e.g. content type, 
status code), the test should try to either print out the HTTP Response text or 
(maybe) include it in the exception test. That way we can debug the failures 
from just the test results

> Add a way for YARN clients to get critical YARN system properties from the RM
> -
>
> Key: YARN-1565
> URL: https://issues.apache.org/jira/browse/YARN-1565
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
> Attachments: YARN-1565-001.patch
>
>
> If you are trying to build up an AM request, you need to know
> # the limits of memory, core &c for the chosen queue
> # the existing YARN classpath
> # the path separator for the target platform (so your classpath comes out 
> right)
> # cluster OS: in case you need some OS-specific changes
> The classpath can be in yarn-site.xml, but a remote client may not have that. 
> The site-xml file doesn't list Queue resource limits, cluster OS or the path 
> separator.
> A way to query the RM for these values would make it easier for YARN clients 
> to build up AM submissions with less guesswork and client-side config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4281:
-
Fix Version/s: (was: 2.7.2)

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4281:
-
Target Version/s: 2.7.2
Priority: Blocker  (was: Major)

Marking this a blocker for 2.7.2, just like YARN-3544 was a blocker originally.

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-10-20 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4179:
---
Attachment: YARN-4179-YARN-2928.02.patch

> [reader implementation] support flow activity queries based on time
> ---
>
> Key: YARN-4179
> URL: https://issues.apache.org/jira/browse/YARN-4179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4179-YARN-2928.01.patch, 
> YARN-4179-YARN-2928.02.patch
>
>
> This came up as part of YARN-4074 and YARN-4075.
> Currently the only query pattern that's supported on the flow activity table 
> is by cluster only. But it might be useful to support queries by cluster and 
> certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4221) Store user in app to flow table

2015-10-20 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4221:
---
Attachment: YARN-4221-YARN-2928.03.patch

> Store user in app to flow table
> ---
>
> Key: YARN-4221
> URL: https://issues.apache.org/jira/browse/YARN-4221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4221-YARN-2928.01.patch, 
> YARN-4221-YARN-2928.02.patch, YARN-4221-YARN-2928.03.patch
>
>
> We should store user as well in in app to flow table.
> For queries where user is not supplied and flow context can be retrieved from 
> app to flow table, we should take the user from app to flow table instead of 
> considering UGI as default user.
> This is as per discussion on YARN-3864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965685#comment-14965685
 ] 

Jason Lowe commented on YARN-4281:
--

The intent is to bring this code better inline with what's happening on 
branch-2, now that YARN-3248 was committed to branch-2.7 (and broke the RM UI 
as a result).  As such I have a few whitespace nits on the patch which when 
fixed should help remove some potentials for merge conflicts when 
cherry-picking other changes from branch-2.  For reference I was diffing the 
patched version of AppBlock.java and RMAppBlock.java against commit 
c9ee316045b83b18cb068aa4de739a1f4b50f02a which is where YARN-3544 went into 
branch-2.  Any diffs that don't belong to another commit delta between branch-2 
and branch-2.7 are what's flagged below.

For this patch hunk, the original YARN-3248 did not have the appReport = null 
line, just the appReport declaration.  Also there's an additional whitespace 
line that was added by this patch which should not be there:
{noformat:title=AppBlock.java}
@@ -87,8 +86,9 @@ protected void render(Block html) {
   return;
 }
 
-callerUGI = getCallerUGI();
-ApplicationReport appReport;
+
+UserGroupInformation callerUGI = getCallerUGI();
+ApplicationReport appReport = null;
 try {
   final GetApplicationReportRequest request =
   GetApplicationReportRequest.newInstance(appID);
{noformat}

Similarly, the patch leaves an additional whitespace line where the import code 
was deleted and adds a new whitespace line where it was added back in:
{noformat:title=RMAppBlock.java}
@@ -20,21 +20,14 @@
 
 import static org.apache.hadoop.yarn.webapp.view.JQueryUI._INFO_WRAP;
 
-import java.security.PrivilegedExceptionAction;
-import java.util.Collection;
-import java.util.Set;
 
 import org.apache.commons.lang.StringEscapeUtils;
 import org.apache.commons.logging.Log;

[...]

 import com.google.inject.Inject;
 
+
+import java.util.Collection;
+import java.util.Set;
+
 public class RMAppBlock extends AppBlock{
 
   private static final Log LOG = LogFactory.getLog(RMAppBlock.class);

{noformat}


> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965691#comment-14965691
 ] 

Hadoop QA commented on YARN-4179:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 44s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 16s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 49s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 27s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 44s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   3m 31s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  49m 41s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767650/YARN-4179-YARN-2928.02.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 581a6b6 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9495/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9495/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9495/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9495/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9495/console |


This message was automatically generated.

> [reader implementation] support flow activity queries based on time
> ---
>
> Key: YARN-4179
> URL: https://issues.apache.org/jira/browse/YARN-4179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4179-YARN-2928.01.patch, 
> YARN-4179-YARN-2928.02.patch
>
>
> This came up as part of YARN-4074 and YARN-4075.
> Currently the only query pattern that's supported on the flow activity table 
> is by cluster only. But it might be useful to support queries by cluster and 
> certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3985:

Attachment: YARN-3985.005.patch

Added a retry since there are multiple events that we need to wait for. 
Draindispatcher await can return if we have drained the first event from the 
queue and the next event has not yet been added. So a simple await does not 
seem to work reliably. 

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3985:

Attachment: YARN-3985.005.patch

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-20 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2913:
--
Attachment: YARN-2913.v2.patch

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-20 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965716#comment-14965716
 ] 

Siqi Li commented on YARN-2913:
---

Hi [~mingma], Can you take a look at this patch?

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4263) Capacity scheduler 60%-40% formatting floating point issue

2015-10-20 Thread Adrian Kalaszi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Kalaszi updated YARN-4263:
-
Attachment: YARN-4263.001.patch

Please review the patch, and consider applying.

> Capacity scheduler 60%-40% formatting floating point issue
> --
>
> Key: YARN-4263
> URL: https://issues.apache.org/jira/browse/YARN-4263
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Adrian Kalaszi
>Priority: Trivial
> Attachments: YARN-4263.001.patch
>
>
> If capacity scheduler is set with two queues to 60% and 40% capacity, due to 
> a java float floating representation issue
> {code}
> > hadoop queue -list
> ==
> Queue Name : default 
> Queue State : running 
> Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: 
> 0.0 
> ==
> Queue Name : large 
> Queue State : running 
> Scheduling Info : Capacity: 60.04, MaximumCapacity: 100.0, 
> CurrentCapacity: 0.0 
> {code}
> Because 
> {code} System.err.println((0.6f) * 100); {code}
> results in 60.04.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4281:
---
Attachment: YARN-4281-branch-2.7.2.patch

Thanks [~jlowe] for review! updated my patch and addressed your concerns in 
2.7.2.patch.

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7.2.patch, 
> YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4282) JVM reuse in Yarn

2015-10-20 Thread Yingqi Lu (JIRA)
Yingqi Lu created YARN-4282:
---

 Summary: JVM reuse in Yarn
 Key: YARN-4282
 URL: https://issues.apache.org/jira/browse/YARN-4282
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Yingqi Lu


Dear All,

Recently, we identified an issue inside Yarn with MapReduce. There is a 
significant amount of time spent in libjvm.so and most of which is compilation. 

Attached is a flame graph (visual call graph) of a query running for about 8 
mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
functions are colored in red. Data show that more than 40% of overall execution 
time is spent in compilation itself, but still a lot of code ran in the 
interpreter mode by looking inside the JVM themselves. In the ideal case, we 
want everything runs with compiled code over and over again. However in 
reality, mappers and reducers are long died before the compilation benefits 
kick in. In other word, we take the performance hit from both compilation and 
interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
was removed in Yarn. We are right now working on a bunch of JVM parameters to 
minimize the impact of the performance, but still think it would be good to 
open a discussion here to seek for more permanent solutions since it ties to 
the nature of how Yarn works. 

We are wondering if any of you have seen this issue before or if there is any 
on-going project already happening to address this? 

Data for this graph was collected across the entire system with multiple JVMs 
running. The workload we use is BigBench workload 
(https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).

Thanks,
Yingqi Lu

1. Software and workloads used in performance tests may have been optimized for 
performance only on Intel microprocessors. Performance tests, such as SYSmark 
and MobileMark, are measured using specific computer systems, components, 
software, operations and functions. Any change to any of those factors may 
cause the results to vary. You should consult other information and performance 
tests to assist you in fully evaluating your contemplated purchases, including 
the performance of that product when combined with other products.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4052) Set SO_KEEPALIVE on NM servers

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4052:
---
Attachment: YARN-4052.3.patch

fix the whitespace issue in .3 patch.
[~jlowe], could you help review the latest patch? Thanks!

> Set SO_KEEPALIVE on NM servers
> --
>
> Key: YARN-4052
> URL: https://issues.apache.org/jira/browse/YARN-4052
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chang Li
> Attachments: YARN-4052.2.patch, YARN-4052.3.patch, YARN-4052.patch
>
>
> Shuffle handler does not set SO_KEEPALIVE so we've seen cases where 
> FDs/sockets get stuck in ESTABLISHED state indefinitely because the server 
> did not see the client leave (network cut or otherwise). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4282) JVM reuse in Yarn

2015-10-20 Thread Yingqi Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingqi Lu updated YARN-4282:

Attachment: flamegraph.png

> JVM reuse in Yarn
> -
>
> Key: YARN-4282
> URL: https://issues.apache.org/jira/browse/YARN-4282
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Yingqi Lu
>  Labels: performance
> Attachments: flamegraph.png
>
>
> Dear All,
> Recently, we identified an issue inside Yarn with MapReduce. There is a 
> significant amount of time spent in libjvm.so and most of which is 
> compilation. 
> Attached is a flame graph (visual call graph) of a query running for about 8 
> mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
> functions are colored in red. Data show that more than 40% of overall 
> execution time is spent in compilation itself, but still a lot of code ran in 
> the interpreter mode by looking inside the JVM themselves. In the ideal case, 
> we want everything runs with compiled code over and over again. However in 
> reality, mappers and reducers are long died before the compilation benefits 
> kick in. In other word, we take the performance hit from both compilation and 
> interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
> was removed in Yarn. We are right now working on a bunch of JVM parameters to 
> minimize the impact of the performance, but still think it would be good to 
> open a discussion here to seek for more permanent solutions since it ties to 
> the nature of how Yarn works. 
> We are wondering if any of you have seen this issue before or if there is any 
> on-going project already happening to address this? 
> Data for this graph was collected across the entire system with multiple JVMs 
> running. The workload we use is BigBench workload 
> (https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).
> Thanks,
> Yingqi Lu
> 1. Software and workloads used in performance tests may have been optimized 
> for performance only on Intel microprocessors. Performance tests, such as 
> SYSmark and MobileMark, are measured using specific computer systems, 
> components, software, operations and functions. Any change to any of those 
> factors may cause the results to vary. You should consult other information 
> and performance tests to assist you in fully evaluating your contemplated 
> purchases, including the performance of that product when combined with other 
> products.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965804#comment-14965804
 ] 

Anubhav Dhoot commented on YARN-3985:
-

Reran the test multiple times without failure

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4269) Log aggregation should not swallow the exception during close()

2015-10-20 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965836#comment-14965836
 ] 

Chang Li commented on YARN-4269:


[~steve_l], yes, this only help log when close fail. It will be good to have 
this log to notice us something could go wrong with log aggregation.
[~bibinchundatt], cleanup only log in debug mode, so I create closeStream to 
log close failure in info.


> Log aggregation should not swallow the exception during close()
> ---
>
> Key: YARN-4269
> URL: https://issues.apache.org/jira/browse/YARN-4269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4269.patch
>
>
> the log aggregation thread ignores exception thrown by close(). It shouldn't 
> be ignored, since the file content may be missing or partial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4269) Log aggregation should not swallow the exception during close()

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4269:
---
Attachment: YARN-4269.2.patch

upload .2 patch to address javadoc and whitespace issue

> Log aggregation should not swallow the exception during close()
> ---
>
> Key: YARN-4269
> URL: https://issues.apache.org/jira/browse/YARN-4269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4269.2.patch, YARN-4269.patch
>
>
> the log aggregation thread ignores exception thrown by close(). It shouldn't 
> be ignored, since the file content may be missing or partial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965864#comment-14965864
 ] 

Jason Lowe commented on YARN-4041:
--

Thanks for updating the patch, Sunil!

When fixing the test, why wasn't the fix in waitForTokensToBeRenewed?  Also I'm 
not thrilled with the idea of sleeping for 1 second per application and hoping 
it's enough time.  And we're getting out early when there is at least one token 
in the token set, but there's a race where we may have taken a snapshot before 
all the tokens are there.  Can't we key off the app start events coming out of 
the token renewal process to know when we're done?  Would be nice if there were 
a more reliable way so we can avoid arbitrary sleeps (which tend to slow down 
unit tests overall) and racy tests.

Also noticed on subsequent look that AbsrtactDelegationTokenRenewerAppEvent s/b 
AbstractDelegationTokenRenewerAppEvent.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965869#comment-14965869
 ] 

Hadoop QA commented on YARN-4281:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767669/YARN-4281-branch-2.7.2.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c8b6f3 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9501/console |


This message was automatically generated.

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7.2.patch, 
> YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965873#comment-14965873
 ] 

Jason Lowe commented on YARN-4281:
--

There's still extra whitespace added by this deletion (note it doesn't delete 
one of the lines around the block deleted, resulting in the extra whitespace):
{noformat}
--- 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
+++ 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
@@ -20,21 +20,14 @@
 
 import static org.apache.hadoop.yarn.webapp.view.JQueryUI._INFO_WRAP;
 
-import java.security.PrivilegedExceptionAction;
-import java.util.Collection;
-import java.util.Set;
 
 import org.apache.commons.lang.StringEscapeUtils;
 import org.apache.commons.logging.Log;
{noformat}

Otherwise looks good.

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7.2.patch, 
> YARN-4281-branch-2.7.patch, YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4221) Store user in app to flow table

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965894#comment-14965894
 ] 

Hadoop QA commented on YARN-4221:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  23m 18s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  12m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  14m 13s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 26s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 22s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m 13s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 58s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 53s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  58m 57s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767657/YARN-4221-YARN-2928.03.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 581a6b6 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9498/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9498/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9498/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9498/console |


This message was automatically generated.

> Store user in app to flow table
> ---
>
> Key: YARN-4221
> URL: https://issues.apache.org/jira/browse/YARN-4221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4221-YARN-2928.01.patch, 
> YARN-4221-YARN-2928.02.patch, YARN-4221-YARN-2928.03.patch
>
>
> We should store user as well in in app to flow table.
> For queries where user is not supplied and flow context can be retrieved from 
> app to flow table, we should take the user from app to flow table instead of 
> considering UGI as default user.
> This is as per discussion on YARN-3864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4281:
---
Attachment: YARN-4281-branch-2.7-3.patch

Thanks [~jlowe] for point out the problem! Fixed that in 2.7-3 patch

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7-3.patch, 
> YARN-4281-branch-2.7.2.patch, YARN-4281-branch-2.7.patch, 
> YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4263) Capacity scheduler 60%-40% formatting floating point issue

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965917#comment-14965917
 ] 

Hadoop QA commented on YARN-4263:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 37s | The applied patch generated  6 
new checkstyle issues (total was 33, now 36). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 10  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 10s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   0m 48s | Tests passed in 
hadoop-mapreduce-client-common. |
| | |  39m 55s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767668/YARN-4263.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c8b6f3 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9499/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9499/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9499/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9499/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9499/console |


This message was automatically generated.

> Capacity scheduler 60%-40% formatting floating point issue
> --
>
> Key: YARN-4263
> URL: https://issues.apache.org/jira/browse/YARN-4263
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Adrian Kalaszi
>Priority: Trivial
>  Labels: easyfix
> Attachments: YARN-4263.001.patch
>
>
> If capacity scheduler is set with two queues to 60% and 40% capacity, due to 
> a java float floating representation issue
> {code}
> > hadoop queue -list
> ==
> Queue Name : default 
> Queue State : running 
> Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: 
> 0.0 
> ==
> Queue Name : large 
> Queue State : running 
> Scheduling Info : Capacity: 60.04, MaximumCapacity: 100.0, 
> CurrentCapacity: 0.0 
> {code}
> Because 
> {code} System.err.println((0.6f) * 100); {code}
> results in 60.04.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4052) Set SO_KEEPALIVE on NM servers

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965939#comment-14965939
 ] 

Hadoop QA commented on YARN-4052:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 22s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  13m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 31s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  1s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 45s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 57s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   0m 26s | Tests passed in 
hadoop-mapreduce-client-shuffle. |
| | |  52m 27s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767671/YARN-4052.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c8b6f3 |
| hadoop-mapreduce-client-shuffle test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9500/artifact/patchprocess/testrun_hadoop-mapreduce-client-shuffle.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9500/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9500/console |


This message was automatically generated.

> Set SO_KEEPALIVE on NM servers
> --
>
> Key: YARN-4052
> URL: https://issues.apache.org/jira/browse/YARN-4052
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chang Li
> Attachments: YARN-4052.2.patch, YARN-4052.3.patch, YARN-4052.patch
>
>
> Shuffle handler does not set SO_KEEPALIVE so we've seen cases where 
> FDs/sockets get stuck in ESTABLISHED state indefinitely because the server 
> did not see the client leave (network cut or otherwise). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965940#comment-14965940
 ] 

Hadoop QA commented on YARN-2913:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  9s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 53s | The applied patch generated  8 
new checkstyle issues (total was 37, now 42). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  57m 23s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767664/YARN-2913.v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c8b6f3 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9496/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9496/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9496/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9496/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9496/console |


This message was automatically generated.

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965943#comment-14965943
 ] 

Hadoop QA commented on YARN-3985:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 50s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  1 
new checkstyle issues (total was 23, now 22). |
| {color:green}+1{color} | whitespace |   0m  9s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  57m 45s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767661/YARN-3985.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c8b6f3 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9497/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9497/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9497/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9497/console |


This message was automatically generated.

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4281) 2.7 RM app page is broken

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965981#comment-14965981
 ] 

Hadoop QA commented on YARN-4281:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  14m 52s | Findbugs (version ) appears to 
be broken on branch-2.7. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | patch |   0m  7s | The patch command could not apply 
the patch. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767684/YARN-4281-branch-2.7-3.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2.7 / 3f3829e |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9503/console |


This message was automatically generated.

> 2.7 RM app page is broken
> -
>
> Key: YARN-4281
> URL: https://issues.apache.org/jira/browse/YARN-4281
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Blocker
> Attachments: YARN-4281-branch-2.7-3.patch, 
> YARN-4281-branch-2.7.2.patch, YARN-4281-branch-2.7.patch, 
> YARN-4281.2.7.modify.patch
>
>
> 2.7 RM app page is broken by the cherry pick of YARN-3248 on 23/Sep. It broke 
> the work around 2.7 patch of YARN-3544 to let it still use container report. 
> Currently, our cluster's 2.7 RM app page is completely broken due to 500 
> error, which is caused by when user UGI is null, completed app can not 
> retrieve its container report, and in that code path, it doesn't catch 
> ContainerNotFoundException, but throw the exception, therefore cause the 500 
> error.
>  Running app is also broken because of the way it construct containerID by 
> {code} "ContainerId.newContainerId(
>   appAttemptReport.getApplicationAttemptId(), 1)" 
> {code}, 
> which will not include epoch number in ID, so it will also get 
> ContainerNotFoundException and throw 500 error.
> Propose to use the branch-2 version of YARN-3544, instead of the work around 
> 2.7 patch because branch 2 patch on 2.7 is no longer blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4269) Log aggregation should not swallow the exception during close()

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966014#comment-14966014
 ] 

Hadoop QA commented on YARN-4269:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  25m 34s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |  10m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  13m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 30s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 58s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 44s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   8m 34s | Tests failed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   2m 34s | Tests passed in 
hadoop-yarn-common. |
| | |  71m 51s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.TestLocalFsFCStatistics |
|   | hadoop.ha.TestZKFailoverController |
|   | hadoop.security.ssl.TestReloadingX509TrustManager |
|   | hadoop.test.TestTimedOutTestsListener |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767677/YARN-4269.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c8b6f3 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9502/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-common.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9502/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9502/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9502/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9502/console |


This message was automatically generated.

> Log aggregation should not swallow the exception during close()
> ---
>
> Key: YARN-4269
> URL: https://issues.apache.org/jira/browse/YARN-4269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4269.2.patch, YARN-4269.patch
>
>
> the log aggregation thread ignores exception thrown by close(). It shouldn't 
> be ignored, since the file content may be missing or partial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966026#comment-14966026
 ] 

Hudson commented on YARN-3985:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8673 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8673/])
YARN-3985. Make ReservationSystem persist state using RMStateStore (arun 
suresh: rev 506d1b1dbcb7ae5dad4a3dc4d415af241c72887c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java


> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966040#comment-14966040
 ] 

Hudson commented on YARN-3985:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1295 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1295/])
YARN-3985. Make ReservationSystem persist state using RMStateStore (arun 
suresh: rev 506d1b1dbcb7ae5dad4a3dc4d415af241c72887c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* hadoop-yarn-project/CHANGES.txt


> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1418) Add Tracing to YARN

2015-10-20 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-1418:
-
Assignee: Masatake Iwasaki  (was: Yi Liu)

> Add Tracing to YARN
> ---
>
> Key: YARN-1418
> URL: https://issues.apache.org/jira/browse/YARN-1418
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, nodemanager, resourcemanager
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>
> Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274.
> The most part of changes needed for basis such as RPC seems to be almost 
> ready in HDFS-5274.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-20 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4256:
---
Attachment: YARN-4256.002.patch

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.7.2
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-20 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966068#comment-14966068
 ] 

Jun Gong commented on YARN-4256:


Thanks [~zxu] for the review and comments.

Attach a new patch to address it and add a new test for the case '1024. mb'.

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.7.2
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4276) Per-application-type local dirs for NodeManager

2015-10-20 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966120#comment-14966120
 ] 

He Tianyi commented on YARN-4276:
-

Thanks, [~ste...@apache.org].
I think diverse workloads scheduling in YARN-2139 has covered this issue.


> Per-application-type local dirs for NodeManager
> ---
>
> Key: YARN-4276
> URL: https://issues.apache.org/jira/browse/YARN-4276
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>
> Requesting the ability of specifying per-application-type local dirs in 
> NodeManager.
> The scenario is having both SSDs & HDDs installed on each node, and managing 
> to launch Spark containers on these SSD drives while other containers 
> belonging to other types of applications on normal HDD drives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966130#comment-14966130
 ] 

Hudson commented on YARN-3985:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #560 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/560/])
YARN-3985. Make ReservationSystem persist state using RMStateStore (arun 
suresh: rev 506d1b1dbcb7ae5dad4a3dc4d415af241c72887c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java


> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966131#comment-14966131
 ] 

Hudson commented on YARN-3985:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #575 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/575/])
YARN-3985. Make ReservationSystem persist state using RMStateStore (arun 
suresh: rev 506d1b1dbcb7ae5dad4a3dc4d415af241c72887c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java


> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3739) Add recovery of reservation system to RM failover process

2015-10-20 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3739:
-
Attachment: YARN-3739-v1.patch

Adding a patch that recovers the reservation system based on the state 
persisted in YARN-3985 along with unit test cases that verify that reservations 
are recovered and all reservation operations work correctly post failover

> Add recovery of reservation system to RM failover process
> -
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3739-v1.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966169#comment-14966169
 ] 

Hudson commented on YARN-3985:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2508 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2508/])
YARN-3985. Make ReservationSystem persist state using RMStateStore (arun 
suresh: rev 506d1b1dbcb7ae5dad4a3dc4d415af241c72887c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java


> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4283) hadoop-yarn Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-20 Thread Nemanja Matkovic (JIRA)
Nemanja Matkovic created YARN-4283:
--

 Summary: hadoop-yarn Avoid unsafe split and append on fields that 
might be IPv6 literals
 Key: YARN-4283
 URL: https://issues.apache.org/jira/browse/YARN-4283
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Nemanja Matkovic


hadoop-yarn part of HADOOP-12122 task



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4283) hadoop-yarn Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-20 Thread Nemanja Matkovic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966177#comment-14966177
 ] 

Nemanja Matkovic commented on YARN-4283:


Yarn changes needed for IPv6 support for work

> hadoop-yarn Avoid unsafe split and append on fields that might be IPv6 
> literals
> ---
>
> Key: YARN-4283
> URL: https://issues.apache.org/jira/browse/YARN-4283
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Nemanja Matkovic
>  Labels: ipv6
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> hadoop-yarn part of HADOOP-12122 task



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4283) hadoop-yarn Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-20 Thread Nemanja Matkovic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966178#comment-14966178
 ] 

Nemanja Matkovic commented on YARN-4283:


This can be comitted only after HDFS portion of changes are in for everything 
to work

> hadoop-yarn Avoid unsafe split and append on fields that might be IPv6 
> literals
> ---
>
> Key: YARN-4283
> URL: https://issues.apache.org/jira/browse/YARN-4283
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Nemanja Matkovic
>  Labels: ipv6
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> hadoop-yarn part of HADOOP-12122 task



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-10-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966200#comment-14966200
 ] 

Sangjin Lee commented on YARN-4179:
---

The latest patch looks pretty good. Only a couple of minor comments.

(TimelineReaderWebServices.java)
- l.125: nit: "daterange" -> "date range" (a couple of other places too)
- l.123-140: I'm pretty sure the logic is correct and does what we intend, but 
it could use some comments to make it easier to read later. For example, 
l.138-139 could have the comment that says it is dealing with the case where a 
single date (without "-") was specified, and so on.

> [reader implementation] support flow activity queries based on time
> ---
>
> Key: YARN-4179
> URL: https://issues.apache.org/jira/browse/YARN-4179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4179-YARN-2928.01.patch, 
> YARN-4179-YARN-2928.02.patch
>
>
> This came up as part of YARN-4074 and YARN-4075.
> Currently the only query pattern that's supported on the flow activity table 
> is by cluster only. But it might be useful to support queries by cluster and 
> certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966208#comment-14966208
 ] 

Hadoop QA commented on YARN-4256:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 57s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 48s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  64m 35s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 110m 49s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767706/YARN-4256.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0c4af0f |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9504/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9504/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9504/console |


This message was automatically generated.

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.7.2
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3739) Add recovery of reservation system to RM failover process

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966210#comment-14966210
 ] 

Hadoop QA commented on YARN-3739:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 33s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 48s | The applied patch generated  5 
new checkstyle issues (total was 156, now 161). |
| {color:red}-1{color} | whitespace |   0m 27s | The patch has 13  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 32s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m  3s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m 36s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767717/YARN-3739-v1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0c4af0f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9505/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9505/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9505/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9505/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9505/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9505/console |


This message was automatically generated.

> Add recovery of reservation system to RM failover process
> -
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3739-v1.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4284:
-

 Summary: condition for AM blacklisting is too narrow
 Key: YARN-4284
 URL: https://issues.apache.org/jira/browse/YARN-4284
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Sangjin Lee


Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
next app attempt can be assigned to a different node.

However, currently the condition under which the node gets blacklist is limited 
to {{DISKS_FAILED}}. There are a whole host of other issues that may cause the 
failure, for which we want to locate the AM elsewhere; e.g. disks full, JVM 
crashes, memory issues, etc.

Since the AM blacklisting is per-app, there is little practical downside in 
blacklisting the nodes on *any failure* (although it might lead to blacklisting 
the node more aggressively than necessary). I would propose locating the next 
app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4284:
--
Description: 
Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
next app attempt can be assigned to a different node.

However, currently the condition under which the node gets blacklisted is 
limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
full, JVM crashes, memory issues, etc.

Since the AM blacklisting is per-app, there is little practical downside in 
blacklisting the nodes on *any failure* (although it might lead to blacklisting 
the node more aggressively than necessary). I would propose locating the next 
app attempt to a different node on any failure.

  was:
Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
next app attempt can be assigned to a different node.

However, currently the condition under which the node gets blacklist is limited 
to {{DISKS_FAILED}}. There are a whole host of other issues that may cause the 
failure, for which we want to locate the AM elsewhere; e.g. disks full, JVM 
crashes, memory issues, etc.

Since the AM blacklisting is per-app, there is little practical downside in 
blacklisting the nodes on *any failure* (although it might lead to blacklisting 
the node more aggressively than necessary). I would propose locating the next 
app attempt to a different node on any failure.


> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reassigned YARN-4284:
-

Assignee: Sangjin Lee

> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4284:
--
Attachment: YARN-4284.001.patch

v.1 patch

> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4284.001.patch
>
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966234#comment-14966234
 ] 

Hudson commented on YARN-3985:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #520 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/520/])
YARN-3985. Make ReservationSystem persist state using RMStateStore (arun 
suresh: rev 506d1b1dbcb7ae5dad4a3dc4d415af241c72887c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java


> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966240#comment-14966240
 ] 

Sunil G commented on YARN-4284:
---

Hi [~sjlee0]
As part of YARN-2293, we were looking into a proposal where we wanted to score 
NMs based on its performance (more failures of attempts, launch failures,disk 
crash etc will result in decrementing score). And for all applications AMs, its 
always best schedule to a highest ranked NM (best performed so far).
But this is a very generic proposal, and we thought of achieving this step by 
step, and YARN-2005 was a first step for this as suggested by [~jlowe]. Coming 
to an improvement, your proposal is very much the same as the next step to this 
and it can give a better probability of a successful AM container launch for 
all applications. Currently there are chances that new application's first AM 
will still fail and only second one will be successful because of AM 
blacklisting.
Downfall in achieving this is to collect the general failures (disk crashes/jvm 
launch problem) Vs application specific errors (some AM containers may not run 
on a node due to its memory or some other factors). If we cannot achieve this, 
then there are chances that due to one specific problem with an AM in a node 
may result that node to be blacklisted for all other apps, this may be 
dangerous. 
So as per you thought also, I think we can collect all the other issues in 
container launches and segregate to generic errors, and blacklist for a period 
of time. +1 for this. Thoughts?

> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4284.001.patch
>
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966248#comment-14966248
 ] 

Sangjin Lee commented on YARN-4284:
---

Hi [~sunilg], thanks for the comment. Yes, I've been following the discussion 
on YARN-2005 as well as YARN-2293. Although it would be nice to have a reliable 
scoring mechanism as a basis for assigning AM containers, what's implemented in 
YARN-2005 is actually a pretty solid solution to this problem. By the way, this 
is one of the more common issues our users encounter.

The only problem with YARN-2005 is that the blacklisting condition is too 
narrow. In fact, we rarely encounter the DISKS_FAILED error. It's usually more 
like INVALID (-1000) or other errors. We can try to be real precise and 
blacklist nodes only if the container exit status is purely due to the node 
itself and is not caused by the app. But maintaining that precise condition may 
prove to be brittle.

IMO the key is that blacklisting implemented in YARN-2005 is *per-app*. As 
such, we can afford to be more aggressive, instead of trying to come up with 
the 100% accurate blacklisting condition. Since it is per-app, there is no risk 
one bad app can cause a node to be blacklisted for all other apps (correct me 
if I'm wrong). Thoughts? Do you see other risk in taking this approach?

> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4284.001.patch
>
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966255#comment-14966255
 ] 

Hudson commented on YARN-3985:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2457/])
YARN-3985. Make ReservationSystem persist state using RMStateStore (arun 
suresh: rev 506d1b1dbcb7ae5dad4a3dc4d415af241c72887c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java


> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch, 
> YARN-3985.004.patch, YARN-3985.005.patch, YARN-3985.005.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966288#comment-14966288
 ] 

Sunil G commented on YARN-4284:
---

Thank you [~sjlee0] for the comments.
Yes, I understood your point and got the idea from the patch also. I was having 
an assumption that, we are looking into a general blacklisting for all apps by 
seeing a failure for one app attempt in a node. Thank you for clarifying the 
same.

This change seems almost fine for me. But as you told, the solution is slightly 
aggressive in marking a node as blacklisted per app. Also I am worried about 
cases like preemption from RM ({{ContainerExitStatus.PREEMPTED}} or 
{{KILLED_BY_RESOURCEMANAGER}}). Due to some queue over usage, RM may select AM 
container to preempt (again this is very unlikely to happen with YARN-1496, but 
its possible). And if application mark this node as blacklisted due to 
preemption or some similar cases, its not so correct I think. How do you feel?



> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4284.001.patch
>
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966323#comment-14966323
 ] 

Hadoop QA commented on YARN-4284:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 58s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m 38s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 104m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767725/YARN-4284.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0c4af0f |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9506/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9506/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9506/console |


This message was automatically generated.

> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4284.001.patch
>
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)