date:20150409


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487812#comment-14487812
 ] 

Hudson commented on YARN-3459:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #159 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/159/])
YARN-3459. Fix failiure of TestLog4jWarningErrorMetricsAppender. (Varun Vasudev 
via wangda) (wangda: rev 7af086a515d573dc90ea4deec7f4e3f23622e0e8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java
* hadoop-yarn-project/CHANGES.txt


 Fix failiure of TestLog4jWarningErrorMetricsAppender
 

 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.8.0

 Attachments: apache-yarn-3459.0.patch


 TestLog4jWarningErrorMetricsAppender fails with the following message:
 {code}
 Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
 Time elapsed: 2.01 sec   FAILURE!
 java.lang.AssertionError: expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


 [ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3431:
--
Attachment: (was: YARN-3431.2.patch)

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-09 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487265#comment-14487265
 ] 

Chengbing Liu commented on YARN-3266:
-

[~jianhe], would you like to take a look at this? Thanks!

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3266.01.patch, YARN-3266.02.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487314#comment-14487314
 ] 

Junping Du commented on YARN-3391:
--

Thanks [~vrushalic] for review, v5 patch LGTM too. [~vinodkv], any additional 
comments?

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch, 
 YARN-3391.4.patch, YARN-3391.5.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI


[ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487581#comment-14487581
 ] 

Hadoop QA commented on YARN-3301:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724229/YARN-3301.2.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7273//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7273//console

This message is automatically generated.

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3301.1.patch, YARN-3301.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-09 Thread Siqi Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487979#comment-14487979
 ] 

Siqi Li commented on YARN-3468:
---

Can anyone have some comments on this jira?

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) Use LinkedHashMap to preserve order of resource requests


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487199#comment-14487199
 ] 

Hudson commented on YARN-3465:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #149 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/149/])
YARN-3465. Use LinkedHashMap to preserve order of resource requests. (Zhihai Xu 
via kasha) (kasha: rev 6495940eae09418a939882a8955845f9241a6485)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt


 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487802#comment-14487802
 ] 

Wangda Tan commented on YARN-3434:
--

[~tgraves], you're right. But I'm wondering why this could happen:

When continousReservation enabled, it will do check in assignContainer:
{code}
if (reservationsContinueLooking  rmContainer == null) {
  // we could possibly ignoring parent queue capacity limits when
  // reservationsContinueLooking is set.
  // If we're trying to reserve a container here, not container will be
  // unreserved for reserving the new one. Check limits again before
  // reserve the new container
  if (!checkLimitsToReserve(clusterResource, 
  application, capability)) {
return Resources.none();
  }
}
{code}

When continousReservation disabled, assignContainers will ensure user-limit 
will not be violated.

My point is, *user-limit and queue max capacity are all checked before reserve 
new container*. And allocation from reserved container will unreserve before 
continue. So I think in your case, 
https://issues.apache.org/jira/browse/YARN-3434?focusedCommentId=14485834page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14485834:
 job-2 cannot reserve 25 * 12 GB containers. Did I miss anything?

And I've a question about continous reservation checking behavior, may or may 
not related to this issue: Now it will try to unreserve all containers under a 
user, but actually it will only unreserve at most one container to allocate a 
new container. Do you think is it fine to change the logic to be:

When (continousReservation-enabled)  (user.usage + required - 
min(max-allocation, user.total-reserved) =user.limit), assignContainers will 
continue. This will prevent doing impossible allocation when user reserved lots 
of containers. (As same as queue reservation checking).

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


 [ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3348:

Attachment: apache-yarn-3348.5.patch

Uploaded a new patch fixing the scrolling issue in the sort, fields screen.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch, 
 apache-yarn-3348.5.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3301:

Attachment: YARN-3301.2.patch

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3301.1.patch, YARN-3301.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487992#comment-14487992
 ] 

Varun Vasudev commented on YARN-3348:
-

Hit submit a little too soon. Deleted the patch I uploaded.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


 [ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3431:
--
Attachment: YARN-3431.3.patch

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488180#comment-14488180
]

Hudson commented on YARN-3055:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7552 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/7552/])
YARN-3055. Fixed ResourceManager's DelegationTokenRenewer to not stop token
renewal of applications part of a bigger workflow. Contributed by Daryn Sharp.
(vinodkv: rev 9c5911294e0ba71aefe4763731b0e780cde9d0ca)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

Key: YARN-3055
URL: https://issues.apache.org/jira/browse/YARN-3055
Project: Hadoop YARN
Issue Type: Bug
Components: security
Reporter: Yi Liu
Assignee: Daryn Sharp
Priority: Blocker
Fix For: 2.7.0

Attachments: YARN-3055.001.patch, YARN-3055.002.patch,
YARN-3055.patch, YARN-3055.patch

After YARN-2964, there is only one timer to renew the token if it's shared by
jobs.
In {{removeApplicationFromRenewal}}, when going to remove a token, and the
token is shared by other jobs, we will not cancel the token.
Meanwhile, we should not cancel the _timerTask_, also we should not remove it
from {{allTokens}}. Otherwise for the existing submitted applications which
share this token will not get renew any more, and for new submitted
applications which share this token, the token will be renew immediately.
For example, we have 3 applications: app1, app2, app3. And they share the
token1. See following scenario:
*1).* app1 is submitted firstly, then app2, and then app3. In this case,
there is only one token renewal timer for token1, and is scheduled when app1
is submitted
*2).* app1 is finished, then the renewal timer is cancelled. token1 will not
be renewed any more, but app2 and app3 still use it, so there is problem.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-09 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487620#comment-14487620
]

Daryn Sharp commented on YARN-3055:
---

Two apps could double renew tokens (completely benign) before this patch. In
practice the possibility is slim and its harmless.

However, currently it's quite buggy. Both apps renewed and then stomped over
each other's dttrs in allTokens. Now both apps reference separate yet
equivalent dttr instances, when the intention was only one app should reference
a token. A second/duplicate timer task was also scheduled. Haven't bothered
to check later fallout from the inconsistencies.

Patch: A double renew can still occur (unavoidable) but only one timer is
scheduled. All apps reference the same dttr instance. Moving the logic down
only creates 3 loops instead of 2 loops but I'll do if you feel strongly.

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

Key: YARN-3055
URL: https://issues.apache.org/jira/browse/YARN-3055
Project: Hadoop YARN
Issue Type: Bug
Components: security
Reporter: Yi Liu
Assignee: Daryn Sharp
Priority: Blocker
Attachments: YARN-3055.001.patch, YARN-3055.002.patch, YARN-3055.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


 [ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3348:

Attachment: apache-yarn-3348.4.patch

Uploaded a new patch which fixes an issue with yarn top output not clearing 
itself correctly.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487818#comment-14487818
 ] 

Hudson commented on YARN-2901:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #159 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/159/])
YARN-2901 addendum: Fixed findbugs warning caused by previously patch (wangda: 
rev ba9ee22ca4ed2c5ff447b66b2e2dfe25f6880fe0)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml


 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
 apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
 apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


 [ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3348:

Attachment: apache-yarn-3348.5.patch

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch, 
 apache-yarn-3348.5.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487824#comment-14487824
 ] 

Hudson commented on YARN-2890:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #159 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/159/])
YARN-2890. MiniYarnCluster should turn on timeline service if configured to do 
so. Contributed by Mit Desai. (hitesh: rev 
265ed1fe804743601a8b62cabc1e4dc2ec8e502f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) Use LinkedHashMap to preserve order of resource requests


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487811#comment-14487811
 ] 

Hudson commented on YARN-3465:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #159 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/159/])
YARN-3465. Use LinkedHashMap to preserve order of resource requests. (Zhihai Xu 
via kasha) (kasha: rev 6495940eae09418a939882a8955845f9241a6485)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java


 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488023#comment-14488023
 ] 

Varun Vasudev commented on YARN-3348:
-

Uploaded a new patch that fixes a scrolling issue and makes the new method in 
YarnClient abstract.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch, 
 apache-yarn-3348.5.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) Use LinkedHashMap to preserve order of resource requests


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486909#comment-14486909
 ] 

Hudson commented on YARN-3465:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7545 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7545/])
YARN-3465. Use LinkedHashMap to preserve order of resource requests. (Zhihai Xu 
via kasha) (kasha: rev 6495940eae09418a939882a8955845f9241a6485)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java


 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3471) Fix timeline client retry


 [ 
https://issues.apache.org/jira/browse/YARN-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3471:
--
Attachment: YARN-3471.1.patch

Upload a patch to fix the problem, and add test cases to verify the retry is 
working properly

 Fix timeline client retry
 -

 Key: YARN-3471
 URL: https://issues.apache.org/jira/browse/YARN-3471
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3471.1.patch


 I found that the client retry has some problems:
 1. The new put methods will retry on all exception, but they should only do 
 it upon ConnectException.
 2. We can reuse TimelineClientConnectionRetry to simplify the retry logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487915#comment-14487915
 ] 

Hudson commented on YARN-2890:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2108 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2108/])
YARN-2890. MiniYarnCluster should turn on timeline service if configured to do 
so. Contributed by Mit Desai. (hitesh: rev 
265ed1fe804743601a8b62cabc1e4dc2ec8e502f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java


 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487357#comment-14487357
 ] 

Hadoop QA commented on YARN-3348:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12724206/apache-yarn-3348.3.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7270//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7270//console

This message is automatically generated.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


 [ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3348:

Attachment: (was: apache-yarn-3348.5.patch)

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3469) Do not set watch for most cases in ZKRMStateStore

2015-04-09 Thread Jun Gong (JIRA)

Jun Gong created YARN-3469:
--

 Summary: Do not set watch for most cases in ZKRMStateStore
 Key: YARN-3469
 URL: https://issues.apache.org/jira/browse/YARN-3469
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
Priority: Minor


In ZKRMStateStore, most operations(e.g. getDataWithRetries, getDataWithRetries, 
getDataWithRetries) set watches on znode. Large watches will cause problem such 
as [ZOOKEEPER-706: large numbers of watches can cause session re-establishment 
to fail](https://issues.apache.org/jira/browse/ZOOKEEPER-706).  

Although there is a workaround that setting jute.maxbuffer to a larger value, 
we need to adjust this value once there are more app and attempts stored in ZK. 
And those watches are useless now. It might be better that do not set watches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3469) Do not set watch for most cases in ZKRMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487515#comment-14487515
 ] 

Hadoop QA commented on YARN-3469:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724219/YARN-3469.01.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7271//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7271//console

This message is automatically generated.

 Do not set watch for most cases in ZKRMStateStore
 -

 Key: YARN-3469
 URL: https://issues.apache.org/jira/browse/YARN-3469
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
Priority: Minor
 Attachments: YARN-3469.01.patch


 In ZKRMStateStore, most operations(e.g. getDataWithRetries, 
 getDataWithRetries, getDataWithRetries) set watches on znode. Large watches 
 will cause problem such as [ZOOKEEPER-706: large numbers of watches can cause 
 session re-establishment to 
 fail|https://issues.apache.org/jira/browse/ZOOKEEPER-706].  
 Although there is a workaround that setting jute.maxbuffer to a larger value, 
 we need to adjust this value once there are more app and attempts stored in 
 ZK. And those watches are useless now. It might be better that do not set 
 watches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy

2015-04-09 Thread Craig Welch (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487631#comment-14487631
]

Craig Welch commented on YARN-3318:
---

bq. ...Do we really see non-comparator based ordering-policy. We are
unnecessarily adding two abstractions - adding policies and comparators...

In the context of the code so far, the comparator based approach is specific
to compounding comparators to achieve functionality (priority + fifo, fair +
fifo, etc). This was the initial motivation for the two level api
configuration, the broader surface of the policy which would allow for
different collection types, sorting on demand, etc, (the original policy) and
the narrower one within that (comparator) for the cases where comparator logic
was sufficient, e.g. sharing a collection (for composition) and a collection
type (a tree, for efficient resorting of individual elements when required) was
possible.

The two level api configuration was not well received. Offline Wangda has
indicated that he thinks there are policies coming up which will need the
wider, initial api, with control over the collection, sorting, etc. Supporting
policy composition for those cases would be very awkward is not really worth
pursuing.

The various competing tradeoffs, the aversion to a multilevel api, the need for
the higher level api, and the ability to compose policies creates something of
a tension, I don't think it's realistic to try and accomplish it all together,
the result will be Frankensteinian at best. Something has to go. Originally,
I chose the multilevel api to resolve the dilemma, I like that choice, it seems
unpopular with the crowd. Given that, the other optional dynamic is the
ability to compose policies (there's no requirement for either of these as far
as I can tell, it is a bonus feature). While I like the composition
approach, it can't be maintained as such with the broader api and without the
multilevel config/api. As one of these has to go and it appears it can't be
the broader api or the multilevel api I suppose it will have to be composition.
Internally there can be some composition of course, but it won't be
transparent/exposed/configurable as it was initially.

I'll put out a patch with that in a bit.

Create Initial OrderingPolicy Framework and FifoOrderingPolicy
--

Key: YARN-3318
URL: https://issues.apache.org/jira/browse/YARN-3318
Project: Hadoop YARN
Issue Type: Sub-task
Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
Attachments: YARN-3318.13.patch, YARN-3318.14.patch,
YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch,
YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch,
YARN-3318.47.patch, YARN-3318.48.patch

Create the initial framework required for using OrderingPolicies and an
initial FifoOrderingPolicy

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3466) Fix RM nodes web page to sort by node HTTP-address, #containers and node-label column

2015-04-09 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488039#comment-14488039
 ] 

Jason Lowe commented on YARN-3466:
--

This is really low risk, and Wangda and I both manually tested the fix.  This 
was broken in 2.7 and it would be great to fix it in the same release so we 
don't regress in a public release.

 Fix RM nodes web page to sort by node HTTP-address, #containers and 
 node-label column
 -

 Key: YARN-3466
 URL: https://issues.apache.org/jira/browse/YARN-3466
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3466.001.patch


 The ResourceManager does not support sorting by the node HTTP address, 
 container count  and node label column on the cluster nodes page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3470) Make PermissionStatusFormat public

2015-04-09 Thread Arun Suresh (JIRA)

Arun Suresh created YARN-3470:
-

 Summary: Make PermissionStatusFormat public
 Key: YARN-3470
 URL: https://issues.apache.org/jira/browse/YARN-3470
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun Suresh
Priority: Minor


implementations of {{INodeAttributeProvider}} are required to provide an 
implementation of {{getPermissionLong()}} method. Unfortunately, the long 
permission format is an encoding of the user, group and mode with each field 
converted to int using {{SerialNumberManager}} which is package protected.

Thus it would be nice to make the {{PermissionStatusFormat}} enum public (and 
also make the {{toLong()}} static method public) so that user specified 
implementations of {{INodeAttributeProvider}} may use it.

This would also make it more consistent with {{AclStatusFormat}} which I guess 
has been made public for the same reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) Use LinkedHashMap to preserve order of resource requests


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487181#comment-14487181
 ] 

Hudson commented on YARN-3465:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2090 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2090/])
YARN-3465. Use LinkedHashMap to preserve order of resource requests. (Zhihai Xu 
via kasha) (kasha: rev 6495940eae09418a939882a8955845f9241a6485)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt


 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3466) Fix RM nodes web page to sort by node HTTP-address, #containers and node-label column

2015-04-09 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487714#comment-14487714
 ] 

Vinod Kumar Vavilapalli commented on YARN-3466:
---

bq. It would be nice to get this into 2.7.
Seeing as how long 2.7.0 release has taken, I propose that we put this in 
2.7.1. I'll start a discussion on the dev lists to immediately follow up 2.7.0 
with a 2.7.1 within 2-3 weeks. That works?

 Fix RM nodes web page to sort by node HTTP-address, #containers and 
 node-label column
 -

 Key: YARN-3466
 URL: https://issues.apache.org/jira/browse/YARN-3466
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3466.001.patch


 The ResourceManager does not support sorting by the node HTTP address, 
 container count  and node label column on the cluster nodes page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3465) Use LinkedHashMap to preserve order of resource requests

2015-04-09 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3465:
---
Summary: Use LinkedHashMap to preserve order of resource requests  (was: 
use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl)

 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488149#comment-14488149
]

Vinod Kumar Vavilapalli commented on YARN-3055:
---

bq. Not related to this patch, it's a bug in YARN-2704 .We should remove the
token from the allTokens, otherwise, it's a leak in allTokens. it can be fixed
separately.
Good catch. Agree that this is not related to this patch, can you please file a
ticket?

Checking this in now.

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-09 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp updated YARN-3055:
--
Attachment: YARN-3055.patch

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3471) Fix timeline client retry

2015-04-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488138#comment-14488138
 ] 

Steve Loughran commented on YARN-3471:
--

# all raised exceptions need to include the URL of the timeline server in them. 
Otherwise nobody will ever be able to track down the problem if its any
# you can actually test TimelineClient (or any Yarn service) in a 
try-with-resources clause, to get the service automatically stopped {{ Service 
extends Closeable}}, see

{code}
try(Timeline client = createClient()) {
}
{code}
 

 Fix timeline client retry
 -

 Key: YARN-3471
 URL: https://issues.apache.org/jira/browse/YARN-3471
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3471.1.patch


 I found that the client retry has some problems:
 1. The new put methods will retry on all exception, but they should only do 
 it upon ConnectException.
 2. We can reuse TimelineClientConnectionRetry to simplify the retry logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487642#comment-14487642
 ] 

Hadoop QA commented on YARN-3348:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12724231/apache-yarn-3348.4.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7274//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7274//console

This message is automatically generated.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy

2015-04-09 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3318:
--
Attachment: YARN-3318.52.patch

Update, removing composition in favor of broader interface

 Create Initial OrderingPolicy Framework and FifoOrderingPolicy
 --

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
 YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, 
 YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch


 Create the initial framework required for using OrderingPolicies and an 
 initial FifoOrderingPolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3469) Do not set watch for most cases in ZKRMStateStore

2015-04-09 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487989#comment-14487989
 ] 

Karthik Kambatla commented on YARN-3469:


When working on YARN-2716, I was wondering about the same. I think not setting 
watches makes sense. I ll let [~jianhe] also comment before committing this. 

 Do not set watch for most cases in ZKRMStateStore
 -

 Key: YARN-3469
 URL: https://issues.apache.org/jira/browse/YARN-3469
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
Priority: Minor
 Attachments: YARN-3469.01.patch


 In ZKRMStateStore, most operations(e.g. getDataWithRetries, 
 getDataWithRetries, getDataWithRetries) set watches on znode. Large watches 
 will cause problem such as [ZOOKEEPER-706: large numbers of watches can cause 
 session re-establishment to 
 fail|https://issues.apache.org/jira/browse/ZOOKEEPER-706].  
 Although there is a workaround that setting jute.maxbuffer to a larger value, 
 we need to adjust this value once there are more app and attempts stored in 
 ZK. And those watches are useless now. It might be better that do not set 
 watches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-09 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488001#comment-14488001
 ] 

Li Lu commented on YARN-3431:
-

Hi [~zjshen], I checked your proposal and in general it LGTM. I have some minor 
concerns, however:

# In general we're using v1 object model for data transferring and storage. 
Rebuilding the special info for subclasses may be challenging, as the special 
keys may be mixed with user defined keys. Even though the chance is low, we may 
want to find a more elegant solution on this.
# How do the sub-class instances identify their own types? I think this is the 
core challenge here. Are we using duck typing here?

That said, maybe we want to have a new data transfer type, that can 
accommodate the extra data in subclasses in extension fields, and can 
self-identify its type? I'm just thinking out loud here... 


 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487245#comment-14487245
 ] 

Hudson commented on YARN-2901:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #892 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/892/])
YARN-2901 addendum: Fixed findbugs warning caused by previously patch (wangda: 
rev ba9ee22ca4ed2c5ff447b66b2e2dfe25f6880fe0)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml


 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
 apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
 apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487251#comment-14487251
 ] 

Hudson commented on YARN-2890:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #892 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/892/])
YARN-2890. MiniYarnCluster should turn on timeline service if configured to do 
so. Contributed by Mit Desai. (hitesh: rev 
265ed1fe804743601a8b62cabc1e4dc2ec8e502f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java


 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration


[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487640#comment-14487640
 ] 

Hadoop QA commented on YARN-3136:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724243/00010-YARN-3136.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7275//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7275//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7275//console

This message is automatically generated.

 getTransferredContainers can be a bottleneck during AM registration
 ---

 Key: YARN-3136
 URL: https://issues.apache.org/jira/browse/YARN-3136
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G
 Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, 
 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 
 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 
 0008-YARN-3136.patch, 0009-YARN-3136.patch


 While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
 stuck waiting for the scheduler lock trying to call getTransferredContainers. 
  The scheduler lock is highly contended, especially on a large cluster with 
 many nodes heartbeating, and it would be nice if we could find a way to 
 eliminate the need to grab this lock during this call.  We've already done 
 similar work during AM allocate calls to make sure they don't needlessly grab 
 the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-09 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488011#comment-14488011
]

Thomas Graves commented on YARN-3434:
-

The code you mention is in the else part of that check where it would do a
reservation. The situation I'm talking about actually allocates a container,
not reserve one. I'll try to explain better:

Application ask for lots of containers. It acquires some containers, then it
reserves some. At this point it hits its normal user limit which in my example
= capacity. It hasn't hit the max amount if can allocate or reserved
(shouldAllocOrReserveNewContainer()). The next node heartbeats in that isn't
yet reserved and has enough space for it to place a container on. It first
checked in assignContainers - canAssignToThisQueue. That passes since we
haven't hit max capacity. Then it checks assignContainers - canAssignToUser.
That passes but only because used - reserved the user limit. This allows it
to continue down into assignContainer. In assignContainer the node has
available space and we haven't hit shouldAllocOrReserveNewContainer().
reservationsContinueLooking is on and labels are empty so it does the check:

{noformat}
if (!shouldAllocOrReserveNewContainer
|| Resources.greaterThan(resourceCalculator, clusterResource,
minimumUnreservedResource, Resources.none()))
{noformat}

as I said before its allowed to allocate or reserve so it passes that test.
Then it hasn't met its maximum capacity (capacity = 30% and max capacity =
100%) yet so that is None and that check doesn't kick in, so it doesn't go into
the block to findNodeToUnreserve(). Then it goes ahead and allocates when it
should have needed to unreserve. Basically we needed to also do the user limit
check again and force it to do the findNodeToUnreserve.

Interaction between reservations and userlimit can result in significant ULF
violation
--

Key: YARN-3434
URL: https://issues.apache.org/jira/browse/YARN-3434
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
Attachments: YARN-3434.patch

ULF was set to 1.0
User was able to consume 1.4X queue capacity.
It looks like when this application launched, it reserved about 1000
containers, each 8G each, within about 5 seconds. I think this allowed the
logic in assignToUser() to allow the userlimit to be surpassed.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs as well as running containers logs


 [ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3347:

Attachment: YARN-3347.3.rebase.patch

 Improve YARN log command to get AMContainer logs as well as running 
 containers logs
 ---

 Key: YARN-3347
 URL: https://issues.apache.org/jira/browse/YARN-3347
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, 
 YARN-3347.2.patch, YARN-3347.2.rebase.patch, YARN-3347.3.patch, 
 YARN-3347.3.rebase.patch


 Right now, we could specify applicationId, node http address and container ID 
 to get the specify container log. Or we could only specify applicationId to 
 get all the container logs. It is very hard for the users to get logs for AM 
 container since the AMContainer logs have more useful information. Users need 
 to know the AMContainer's container ID and related Node http address.
 We could improve the YARN Log Command to allow users to get AMContainer logs 
 directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486863#comment-14486863
 ] 

Zhijie Shen commented on YARN-3431:
---

I uploaded a patch to resolve the problem in the other way. I think of the sub 
classes again, and find it is not necessary to have the corresponding web 
service resources for them. In fact, there're two levels:

1. Java API Level: we want to have the sub-classes of TimelineEntity as the 
first class citizen, which can facilitate users to operate on the predefined 
entities. They may have special setters/getters.

2. REST API Level:  JSON schema isn't polymorphic, such that we should have one 
schema that is generic enough to describe different kinds of entities. 
Fortunately, the entity schema is able to do that. The sub-classes of 
TimelineEntity contain the following additional information:

a) Special attributes: they can be put into the info map of the entity, and 
treated as the predefined info. For example, queue of application entity can be 
put into info with key=QUEUE_INFO_KEY and value = some queue name.
b) Parent-child relationship: they can be put into the relate/is_related_to 
relationship map of the entity. The relate/is_related_to relationship can 
describe an arbitrary directed graph, and tree is one type of directed graphs.

In the new patch, I fixed the API records instead of the endpoint. Therefore, 
we will still have a single endpoint to accept entities, while Java APIs  keep 
unchanged too. In terms of JSON content for communication, we will always use 
the generic entity schema for TimelineEntity and all kinds of its sub-classes.

BTW, I fixed some minor issue together in this patch, such as renaming 
UserEntity and QueueEntity, and FlowEntity attributes.

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-04-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487339#comment-14487339
 ] 

Steve Loughran commented on YARN-2423:
--

@rkanter —...if the YARN-2444 patch gets in (do you want to review that?)— all 
my production-side recommendations will be there

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI


[ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487050#comment-14487050
 ] 

Xuan Gong commented on YARN-3301:
-

bq. should it be the fix for 2.7?

This is the format issue.  It is OK to be the fix for 2.7, but not necessary to 
be the blocker.

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3301.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-09 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487482#comment-14487482
]

Daryn Sharp commented on YARN-3055:
---

Thanks Vinod, I'll revise this morning. The ignores shouldn't be there. I did
that for our internal emergency fix because we I didn't handle proxy refresh
tokens so I didn't care the tests failed.

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) Fix failiure of TestLog4jWarningErrorMetricsAppender


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487239#comment-14487239
 ] 

Hudson commented on YARN-3459:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #892 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/892/])
YARN-3459. Fix failiure of TestLog4jWarningErrorMetricsAppender. (Varun Vasudev 
via wangda) (wangda: rev 7af086a515d573dc90ea4deec7f4e3f23622e0e8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java


 Fix failiure of TestLog4jWarningErrorMetricsAppender
 

 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.8.0

 Attachments: apache-yarn-3459.0.patch


 TestLog4jWarningErrorMetricsAppender fails with the following message:
 {code}
 Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
 Time elapsed: 2.01 sec   FAILURE!
 java.lang.AssertionError: expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-04-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487859#comment-14487859
 ] 

Wangda Tan commented on YARN-2801:
--

I've a doc for 2.6 and in apt format, will try to cover new changes in trunk 
and convert to markdown soon. Will keep you posted.

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan

 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat

2015-04-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487551#comment-14487551
 ] 

Junping Du commented on YARN-1376:
--

Thanks [~xgong] for updating the patch.
I just reviewed and have some comments below, most are minor issues except the 
last one:
{code}
public static final long DEFAULT_LOG_AGGREGATION_STATUS_TIME_OUT_MS = 
10*60*1000;
{code}
Add space between number and *.

In LogAggregationReportPBImpl.java,
{code}
+  private LogAggregationStatus convertFromProtoFormat(
+  LogAggregationStatusProto s) {
+return LogAggregationStatus.valueOf(s.name().replace(
+  LOGAGGREGATION_STATUS_PREFIX, ));
+  }
+
+  private LogAggregationStatusProto
+  convertToProtoFormat(LogAggregationStatus s) {
+return LogAggregationStatusProto.valueOf(LOGAGGREGATION_STATUS_PREFIX
++ s.name());
+  }
{code}
Looks like we are adding/removing LOGAGGREGATION_STATUS_PREFIX between java obj 
and proto obj. I think this is not necessary? Am I missing something here?

In NodeStatusUpdaterImpl.java,
{code}
+  if (!latestLogAggregationReports.containsKey(logAggregationReport
+.getApplicationId())) {
  ...  // A
+  } else {
 ... // B
   }
{code}
Can we remove ! in if condition and adjust sequence of A and B, which looks 
simpler?

In uploadLogsForContainers() of AppLogAggregatorImpl.java,
{code}
   } catch (Exception e) {
 LOG.error(
   Failed to move temporary log file to final location: [
   + remoteNodeTmpLogFileForApp + ] to [
   + renamedPath + ], e);
+diagnosticMessage =
+Log uploaded failed for Application:  + appId
++  in NodeManager: 
++ LogAggregationUtils.getNodeString(nodeId) +  at 
++ Times.format(currentTime) + \n;
   }
+
+  LogAggregationReport report =
+  Records.newRecord(LogAggregationReport.class);
+  report.setApplicationId(appId);
+  report.setNodeId(nodeId);
+  report.setDiagnosticMessage(diagnosticMessage);
+  if (appFinished) {
+report.setLogAggregationStatus( LogAggregationStatus.FINISHED);
+  } else {
+report.setLogAggregationStatus( LogAggregationStatus.RUNNING);
+  }
+  this.context.getLogAggregationStatusForApps().add(report);
{code}
Looks like we only set LogAggregationStatus to FINISHED or RUNNING here even it 
is failed to move temp log to HDFS. It doesn't seems correct to me. We should 
add a FAILED state for LogAggregationStatus to address this case?

Other looks fine to me.


 NM need to notify the log aggregation status to RM through Node heartbeat
 -

 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, 
 YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
 YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, 
 YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch


 Expose a client API to allow clients to figure if log aggregation is 
 complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

[
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487975#comment-14487975
]

Hadoop QA commented on YARN-3361:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12723968/YARN-3361.4.patch
against trunk revision 3fe61e0.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 11 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 1 new
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}. The applied patch generated 1
release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/7278//testReport/
Release audit warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/7278//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/7278//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7278//console

This message is automatically generated.

CapacityScheduler side changes to support non-exclusive node labels
---

Key: YARN-3361
URL: https://issues.apache.org/jira/browse/YARN-3361
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
Attachments: YARN-3361.1.patch, YARN-3361.2.patch, YARN-3361.3.patch,
YARN-3361.4.patch

According to design doc attached in YARN-3214, we need implement following
logic in CapacityScheduler:
1) When allocate a resource request with no node-label specified, it should
get preferentially allocated to node without labels.
2) When there're some available resource in a node with label, they can be
used by applications with following order:
- Applications under queues which can access the label and ask for same
labeled resource.
- Applications under queues which can access the label and ask for
non-labeled resource.
- Applications under queues cannot access the label and ask for non-labeled
resource.
3) Expose necessary information that can be used by preemption policy to make
preemption decisions.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) Use LinkedHashMap to preserve order of resource requests


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487238#comment-14487238
 ] 

Hudson commented on YARN-3465:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #892 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/892/])
YARN-3465. Use LinkedHashMap to preserve order of resource requests. (Zhihai Xu 
via kasha) (kasha: rev 6495940eae09418a939882a8955845f9241a6485)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt


 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN

2015-04-09 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487933#comment-14487933
 ] 

Li Lu commented on YARN-3426:
-

The problem for the current solution is we're duplicating many maven code for 
hadoop-common/hdfs and yarn. We're also introducing duplications to mapreduce 
in the current approach. The next step for this work should be removing the 
duplications for those maven code. Meanwhile, for YARN, we may want to add 
maven codes to generate javadocs for public APIs only, similar to 
hadoop-common/hdfs. 

 Add jdiff support to YARN
 -

 Key: YARN-3426
 URL: https://issues.apache.org/jira/browse/YARN-3426
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
 YARN-3426-040715.patch, YARN-3426-040815.patch


 Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
 to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3471) Fix timeline client retry

Zhijie Shen created YARN-3471:
-

 Summary: Fix timeline client retry
 Key: YARN-3471
 URL: https://issues.apache.org/jira/browse/YARN-3471
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


I found that the client retry has some problems:

1. The new put methods will retry on all exception, but they should only do it 
upon ConnectException.
2. We can reuse TimelineClientConnectionRetry to simplify the retry logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3347) Improve YARN log command to get AMContainer logs as well as running containers logs


[ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487441#comment-14487441
 ] 

Hadoop QA commented on YARN-3347:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724216/YARN-3347.3.1.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7272//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7272//console

This message is automatically generated.

 Improve YARN log command to get AMContainer logs as well as running 
 containers logs
 ---

 Key: YARN-3347
 URL: https://issues.apache.org/jira/browse/YARN-3347
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, 
 YARN-3347.2.patch, YARN-3347.2.rebase.patch, YARN-3347.3.1.patch, 
 YARN-3347.3.patch, YARN-3347.3.rebase.patch


 Right now, we could specify applicationId, node http address and container ID 
 to get the specify container log. Or we could only specify applicationId to 
 get all the container logs. It is very hard for the users to get logs for AM 
 container since the AMContainer logs have more useful information. Users need 
 to know the AMContainer's container ID and related Node http address.
 We could improve the YARN Log Command to allow users to get AMContainer logs 
 directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488115#comment-14488115
 ] 

Jian He commented on YARN-3055:
---

thanks Daryn,  patch looks good to me too.  +1 

Not related to this patch, it's a bug in YARN-2704 .We should remove the token 
from the allTokens, otherwise, it's a leak in allTokens. it can be fixed 
separately.
{code}
if (t.token.getKind().equals(new Text(HDFS_DELEGATION_TOKEN))) {
  iter.remove();
  t.cancelTimer();
  LOG.info(Removed expiring token  + t);
}
{code}

 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch, 
 YARN-3055.patch, YARN-3055.patch


 After YARN-2964, there is only one timer to renew the token if it's shared by 
 jobs. 
 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487146#comment-14487146
 ] 

Hadoop QA commented on YARN-3225:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724185/YARN-3225-4.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.api.impl.TestTimelineClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7268//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7268//console

This message is automatically generated.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225-4.patch, YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) Fix failiure of TestLog4jWarningErrorMetricsAppender


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487121#comment-14487121
 ] 

Hudson commented on YARN-3459:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #158 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/158/])
YARN-3459. Fix failiure of TestLog4jWarningErrorMetricsAppender. (Varun Vasudev 
via wangda) (wangda: rev 7af086a515d573dc90ea4deec7f4e3f23622e0e8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java
* hadoop-yarn-project/CHANGES.txt


 Fix failiure of TestLog4jWarningErrorMetricsAppender
 

 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.8.0

 Attachments: apache-yarn-3459.0.patch


 TestLog4jWarningErrorMetricsAppender fails with the following message:
 {code}
 Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
 Time elapsed: 2.01 sec   FAILURE!
 java.lang.AssertionError: expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487127#comment-14487127
 ] 

Hudson commented on YARN-2901:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #158 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/158/])
YARN-2901 addendum: Fixed findbugs warning caused by previously patch (wangda: 
rev ba9ee22ca4ed2c5ff447b66b2e2dfe25f6880fe0)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml


 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
 apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
 apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487133#comment-14487133
 ] 

Hudson commented on YARN-2890:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #158 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/158/])
YARN-2890. MiniYarnCluster should turn on timeline service if configured to do 
so. Contributed by Mit Desai. (hitesh: rev 
265ed1fe804743601a8b62cabc1e4dc2ec8e502f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java


 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3466) Fix RM nodes web page to sort by node HTTP-address, #containers and node-label column

2015-04-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487733#comment-14487733
 ] 

Wangda Tan commented on YARN-3466:
--

Just committed to trunk/branch-2, [~jlowe], [~vinodkv], please let me know when 
you figure out whether we need put to 2.7.x.

 Fix RM nodes web page to sort by node HTTP-address, #containers and 
 node-label column
 -

 Key: YARN-3466
 URL: https://issues.apache.org/jira/browse/YARN-3466
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3466.001.patch


 The ResourceManager does not support sorting by the node HTTP address, 
 container count  and node label column on the cluster nodes page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs as well as running containers logs


 [ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3347:

Attachment: YARN-3347.3.1.patch

 Improve YARN log command to get AMContainer logs as well as running 
 containers logs
 ---

 Key: YARN-3347
 URL: https://issues.apache.org/jira/browse/YARN-3347
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, 
 YARN-3347.2.patch, YARN-3347.2.rebase.patch, YARN-3347.3.1.patch, 
 YARN-3347.3.patch, YARN-3347.3.rebase.patch


 Right now, we could specify applicationId, node http address and container ID 
 to get the specify container log. Or we could only specify applicationId to 
 get all the container logs. It is very hard for the users to get logs for AM 
 container since the AMContainer logs have more useful information. Users need 
 to know the AMContainer's container ID and related Node http address.
 We could improve the YARN Log Command to allow users to get AMContainer logs 
 directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


 [ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3431:
--
Attachment: YARN-3431.2.patch

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.2.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3469) Do not set watch for most cases in ZKRMStateStore

2015-04-09 Thread Jun Gong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jun Gong updated YARN-3469:
---
Description:
In ZKRMStateStore, most operations(e.g. getDataWithRetries, getDataWithRetries,
getDataWithRetries) set watches on znode. Large watches will cause problem such
as [ZOOKEEPER-706: large numbers of watches can cause session re-establishment
to fail|https://issues.apache.org/jira/browse/ZOOKEEPER-706].

Although there is a workaround that setting jute.maxbuffer to a larger value,
we need to adjust this value once there are more app and attempts stored in ZK.
And those watches are useless now. It might be better that do not set watches.

was:
In ZKRMStateStore, most operations(e.g. getDataWithRetries, getDataWithRetries,
getDataWithRetries) set watches on znode. Large watches will cause problem such
as [ZOOKEEPER-706: large numbers of watches can cause session re-establishment
to fail](https://issues.apache.org/jira/browse/ZOOKEEPER-706).

Do not set watch for most cases in ZKRMStateStore
-

Key: YARN-3469
URL: https://issues.apache.org/jira/browse/YARN-3469
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
Priority: Minor

In ZKRMStateStore, most operations(e.g. getDataWithRetries,
getDataWithRetries, getDataWithRetries) set watches on znode. Large watches
will cause problem such as [ZOOKEEPER-706: large numbers of watches can cause
session re-establishment to
fail|https://issues.apache.org/jira/browse/ZOOKEEPER-706].
Although there is a workaround that setting jute.maxbuffer to a larger value,
we need to adjust this value once there are more app and attempts stored in
ZK. And those watches are useless now. It might be better that do not set
watches.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3347) Improve YARN log command to get AMContainer logs as well as running containers logs


[ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487145#comment-14487145
 ] 

Hadoop QA commented on YARN-3347:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12724191/YARN-3347.3.rebase.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7269//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7269//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7269//console

This message is automatically generated.

 Improve YARN log command to get AMContainer logs as well as running 
 containers logs
 ---

 Key: YARN-3347
 URL: https://issues.apache.org/jira/browse/YARN-3347
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3347.1.patch, YARN-3347.1.rebase.patch, 
 YARN-3347.2.patch, YARN-3347.2.rebase.patch, YARN-3347.3.patch, 
 YARN-3347.3.rebase.patch


 Right now, we could specify applicationId, node http address and container ID 
 to get the specify container log. Or we could only specify applicationId to 
 get all the container logs. It is very hard for the users to get logs for AM 
 container since the AMContainer logs have more useful information. Users need 
 to know the AMContainer's container ID and related Node http address.
 We could improve the YARN Log Command to allow users to get AMContainer logs 
 directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-04-09 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3225:

Attachment: YARN-3225-4.patch

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225-4.patch, YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI


[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486854#comment-14486854
 ] 

Hadoop QA commented on YARN-3293:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723930/apache-yarn-3293.6.patch
  against trunk revision b1e0590.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7267//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7267//console

This message is automatically generated.

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
 apache-yarn-3293.4.patch, apache-yarn-3293.5.patch, apache-yarn-3293.6.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) Fix failiure of TestLog4jWarningErrorMetricsAppender


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487182#comment-14487182
 ] 

Hudson commented on YARN-3459:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2090 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2090/])
YARN-3459. Fix failiure of TestLog4jWarningErrorMetricsAppender. (Varun Vasudev 
via wangda) (wangda: rev 7af086a515d573dc90ea4deec7f4e3f23622e0e8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java


 Fix failiure of TestLog4jWarningErrorMetricsAppender
 

 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.8.0

 Attachments: apache-yarn-3459.0.patch


 TestLog4jWarningErrorMetricsAppender fails with the following message:
 {code}
 Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
 Time elapsed: 2.01 sec   FAILURE!
 java.lang.AssertionError: expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487212#comment-14487212
 ] 

Hudson commented on YARN-2890:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #149 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/149/])
YARN-2890. MiniYarnCluster should turn on timeline service if configured to do 
so. Contributed by Mit Desai. (hitesh: rev 
265ed1fe804743601a8b62cabc1e4dc2ec8e502f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487188#comment-14487188
 ] 

Hudson commented on YARN-2901:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2090 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2090/])
YARN-2901 addendum: Fixed findbugs warning caused by previously patch (wangda: 
rev ba9ee22ca4ed2c5ff447b66b2e2dfe25f6880fe0)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml


 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
 apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
 apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) Fix failiure of TestLog4jWarningErrorMetricsAppender


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487200#comment-14487200
 ] 

Hudson commented on YARN-3459:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #149 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/149/])
YARN-3459. Fix failiure of TestLog4jWarningErrorMetricsAppender. (Varun Vasudev 
via wangda) (wangda: rev 7af086a515d573dc90ea4deec7f4e3f23622e0e8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java


 Fix failiure of TestLog4jWarningErrorMetricsAppender
 

 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.8.0

 Attachments: apache-yarn-3459.0.patch


 TestLog4jWarningErrorMetricsAppender fails with the following message:
 {code}
 Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
 Time elapsed: 2.01 sec   FAILURE!
 java.lang.AssertionError: expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487194#comment-14487194
 ] 

Hudson commented on YARN-2890:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2090 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2090/])
YARN-2890. MiniYarnCluster should turn on timeline service if configured to do 
so. Contributed by Mit Desai. (hitesh: rev 
265ed1fe804743601a8b62cabc1e4dc2ec8e502f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java


 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487206#comment-14487206
 ] 

Hudson commented on YARN-2901:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #149 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/149/])
YARN-2901 addendum: Fixed findbugs warning caused by previously patch (wangda: 
rev ba9ee22ca4ed2c5ff447b66b2e2dfe25f6880fe0)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml


 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
 apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
 apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-04-09 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486785#comment-14486785
 ] 

Tsuyoshi Ozawa commented on YARN-2801:
--

It would be good to add a documentation to 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/*.md.

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan

 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-09 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487416#comment-14487416
 ] 

Thomas Graves commented on YARN-3434:
-

[~wangda]  I'm not sure I follow what are saying?  The reservations are already 
counted in the users usage and we do consider reserved when doing the user 
limit calculations.   Look at LeafQueue.assignContainers call to 
allocateResource is where it ends up adding to user usage.The 
canAssignToUser is where it does user limit check and substracts the 
reservations off to see if it can continue.  

Note I do think we should just get rid of the config for 
reservationsContinueLooking, but that is a separate issue.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-09 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488061#comment-14488061
]

Thomas Graves commented on YARN-3434:
-

{quote}
And I've a question about continous reservation checking behavior, may or may
not related to this issue: Now it will try to unreserve all containers under a
user, but actually it will only unreserve at most one container to allocate a
new container. Do you think is it fine to change the logic to be:
When (continousReservation-enabled) (user.usage + required -
min(max-allocation, user.total-reserved) =user.limit), assignContainers will
continue. This will prevent doing impossible allocation when user reserved lots
of containers. (As same as queue reservation checking).
{quote}

I do think the reservation checking and unreserving can be improved. I
basically started with very simple thing and figured we could improve. I'm not
sure how much that check would help in practice. I guess it might help the
cases where you have 1 user in the queue and a second one shows up and your
user limit gets decreased by a lot. In that case it may prevent it from
continuing when it can short circuit here. So it would seem to be ok for that.

Interaction between reservations and userlimit can result in significant ULF
violation
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486776#comment-14486776
 ] 

Hadoop QA commented on YARN-3465:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724015/YARN-3465.000.patch
  against trunk revision b1e0590.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7266//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7266//console

This message is automatically generated.

 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-04-09 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486799#comment-14486799
 ] 

Naganarasimha G R commented on YARN-2801:
-

Hi [~Wangda]  [~ozawa], i can help if anything is required for this 

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan

 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487898#comment-14487898
 ] 

Junping Du commented on YARN-3391:
--

Sync offline with Vinod that he is fine with the latest patch. I will go ahead 
to commit it soon.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch, 
 YARN-3391.4.patch, YARN-3391.5.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) Use LinkedHashMap to preserve order of resource requests


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487902#comment-14487902
 ] 

Hudson commented on YARN-3465:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2108 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2108/])
YARN-3465. Use LinkedHashMap to preserve order of resource requests. (Zhihai Xu 
via kasha) (kasha: rev 6495940eae09418a939882a8955845f9241a6485)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt


 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3466) Fix RM nodes web page to sort by node HTTP-address, #containers and node-label column


[ 
https://issues.apache.org/jira/browse/YARN-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487738#comment-14487738
 ] 

Hudson commented on YARN-3466:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7547 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7547/])
YARN-3466. Fix RM nodes web page to sort by node HTTP-address, #containers and 
node-label column. (Jason Lowe via wangda) (wangda: rev 
1885141e90837252934192040a40047c7adbc1b5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* hadoop-yarn-project/CHANGES.txt


 Fix RM nodes web page to sort by node HTTP-address, #containers and 
 node-label column
 -

 Key: YARN-3466
 URL: https://issues.apache.org/jira/browse/YARN-3466
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3466.001.patch


 The ResourceManager does not support sorting by the node HTTP address, 
 container count  and node label column on the cluster nodes page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI


[ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487047#comment-14487047
 ] 

Xuan Gong commented on YARN-3301:
-

Test failures are not related

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3301.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487781#comment-14487781
 ] 

Hadoop QA commented on YARN-3055:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724256/YARN-3055.patch
  against trunk revision 6495940.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7276//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7276//console

This message is automatically generated.

 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch, 
 YARN-3055.patch, YARN-3055.patch


 After YARN-2964, there is only one timer to renew the token if it's shared by 
 jobs. 
 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3465) Use LinkedHashMap to preserve order of resource requests


[ 
https://issues.apache.org/jira/browse/YARN-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487120#comment-14487120
 ] 

Hudson commented on YARN-3465:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #158 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/158/])
YARN-3465. Use LinkedHashMap to preserve order of resource requests. (Zhihai Xu 
via kasha) (kasha: rev 6495940eae09418a939882a8955845f9241a6485)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java


 Use LinkedHashMap to preserve order of resource requests
 

 Key: YARN-3465
 URL: https://issues.apache.org/jira/browse/YARN-3465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-3465.000.patch


 use LinkedHashMap to keep the order of LocalResourceRequest in ContainerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3469) Do not set watch for most cases in ZKRMStateStore

2015-04-09 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3469:
---
Attachment: YARN-3469.01.patch

 Do not set watch for most cases in ZKRMStateStore
 -

 Key: YARN-3469
 URL: https://issues.apache.org/jira/browse/YARN-3469
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
Priority: Minor
 Attachments: YARN-3469.01.patch


 In ZKRMStateStore, most operations(e.g. getDataWithRetries, 
 getDataWithRetries, getDataWithRetries) set watches on znode. Large watches 
 will cause problem such as [ZOOKEEPER-706: large numbers of watches can cause 
 session re-establishment to 
 fail|https://issues.apache.org/jira/browse/ZOOKEEPER-706].  
 Although there is a workaround that setting jute.maxbuffer to a larger value, 
 we need to adjust this value once there are more app and attempts stored in 
 ZK. And those watches are useless now. It might be better that do not set 
 watches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) Fix failiure of TestLog4jWarningErrorMetricsAppender


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487903#comment-14487903
 ] 

Hudson commented on YARN-3459:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2108 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2108/])
YARN-3459. Fix failiure of TestLog4jWarningErrorMetricsAppender. (Varun Vasudev 
via wangda) (wangda: rev 7af086a515d573dc90ea4deec7f4e3f23622e0e8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java
* hadoop-yarn-project/CHANGES.txt


 Fix failiure of TestLog4jWarningErrorMetricsAppender
 

 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 2.8.0

 Attachments: apache-yarn-3459.0.patch


 TestLog4jWarningErrorMetricsAppender fails with the following message:
 {code}
 Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
 Time elapsed: 2.01 sec   FAILURE!
 java.lang.AssertionError: expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487909#comment-14487909
 ] 

Hudson commented on YARN-2901:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2108 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2108/])
YARN-2901 addendum: Fixed findbugs warning caused by previously patch (wangda: 
rev ba9ee22ca4ed2c5ff447b66b2e2dfe25f6880fe0)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml


 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
 apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
 apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-04-09 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3136:
--
Attachment: 00010-YARN-3136.patch

Yes [~jianhe]
I added that to fix findbugs which is not needed. I updated patch as per 
initial understanding. Kindly check.

 getTransferredContainers can be a bottleneck during AM registration
 ---

 Key: YARN-3136
 URL: https://issues.apache.org/jira/browse/YARN-3136
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G
 Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, 
 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 
 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 
 0008-YARN-3136.patch, 0009-YARN-3136.patch


 While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
 stuck waiting for the scheduler lock trying to call getTransferredContainers. 
  The scheduler lock is highly contended, especially on a large cluster with 
 many nodes heartbeating, and it would be nice if we could find a way to 
 eliminate the need to grab this lock during this call.  We've already done 
 similar work during AM allocate calls to make sure they don't needlessly grab 
 the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488206#comment-14488206
 ] 

Hadoop QA commented on YARN-3348:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12724296/apache-yarn-3348.5.patch
  against trunk revision 61dc2ea.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7279//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7279//console

This message is automatically generated.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch, 
 apache-yarn-3348.2.patch, apache-yarn-3348.3.patch, apache-yarn-3348.4.patch, 
 apache-yarn-3348.5.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487871#comment-14487871
]

Vinod Kumar Vavilapalli commented on YARN-3055:
---

You are right that the previous code also had the same issue. I am good with
the patch.

Will check this in unless jenkins or [~jianhe] say no.

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage