date:20141229


[ 
https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259942#comment-14259942
 ] 

Rohith commented on YARN-2991:
--

I am able to reproduce this in eclipse randomly, looking into root reason. 

 TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on 
 trunk
 --

 Key: YARN-2991
 URL: https://issues.apache.org/jira/browse/YARN-2991
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Rohith
Priority: Blocker

 {code}
 Error Message
 test timed out after 6 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 6 milliseconds
   at java.lang.Object.wait(Native Method)
   at java.lang.Thread.join(Thread.java:1281)
   at java.lang.Thread.join(Thread.java:1355)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873)
 {code}
 It happened twice this months:
 https://builds.apache.org/job/PreCommit-YARN-Build/6096/
 https://builds.apache.org/job/PreCommit-YARN-Build/6182/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk


 [ 
https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2991:
-
Attachment: 0001-YARN-2991.patch

 TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on 
 trunk
 --

 Key: YARN-2991
 URL: https://issues.apache.org/jira/browse/YARN-2991
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2991.patch


 {code}
 Error Message
 test timed out after 6 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 6 milliseconds
   at java.lang.Object.wait(Native Method)
   at java.lang.Thread.join(Thread.java:1281)
   at java.lang.Thread.join(Thread.java:1355)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873)
 {code}
 It happened twice this months:
 https://builds.apache.org/job/PreCommit-YARN-Build/6096/
 https://builds.apache.org/job/PreCommit-YARN-Build/6182/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260035#comment-14260035
 ] 

Rohith commented on YARN-2991:
--

In serviceStop() , eventHandlingThread is interrupted and join for thread to 
complete. In test case, DrainDispatcher used which create its own thread. But 
real issue for randomness is when thread.Interupt is called, it is not madatory 
that thread will get interrupt unless thread is blocked.  So there should be 
mechanism to exit thread by setting boolean flag in while loop.
Updated the patch for handling this. I run the test many times, it is able to 
run without getting hang.
Kindly review the patch

 TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on 
 trunk
 --

 Key: YARN-2991
 URL: https://issues.apache.org/jira/browse/YARN-2991
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2991.patch


 {code}
 Error Message
 test timed out after 6 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 6 milliseconds
   at java.lang.Object.wait(Native Method)
   at java.lang.Thread.join(Thread.java:1281)
   at java.lang.Thread.join(Thread.java:1355)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873)
 {code}
 It happened twice this months:
 https://builds.apache.org/job/PreCommit-YARN-Build/6096/
 https://builds.apache.org/job/PreCommit-YARN-Build/6182/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk


 [ 
https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2991:
-
Target Version/s: 2.7.0

 TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on 
 trunk
 --

 Key: YARN-2991
 URL: https://issues.apache.org/jira/browse/YARN-2991
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2991.patch


 {code}
 Error Message
 test timed out after 6 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 6 milliseconds
   at java.lang.Object.wait(Native Method)
   at java.lang.Thread.join(Thread.java:1281)
   at java.lang.Thread.join(Thread.java:1355)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873)
 {code}
 It happened twice this months:
 https://builds.apache.org/job/PreCommit-YARN-Build/6096/
 https://builds.apache.org/job/PreCommit-YARN-Build/6182/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260179#comment-14260179
 ] 

Hadoop QA commented on YARN-2991:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689320/0001-YARN-2991.patch
  against trunk revision 1454efe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6202//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6202//console

This message is automatically generated.

 TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on 
 trunk
 --

 Key: YARN-2991
 URL: https://issues.apache.org/jira/browse/YARN-2991
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2991.patch


 {code}
 Error Message
 test timed out after 6 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 6 milliseconds
   at java.lang.Object.wait(Native Method)
   at java.lang.Thread.join(Thread.java:1281)
   at java.lang.Thread.join(Thread.java:1355)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873)
 {code}
 It happened twice this months:
 https://builds.apache.org/job/PreCommit-YARN-Build/6096/
 https://builds.apache.org/job/PreCommit-YARN-Build/6182/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-29 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260263#comment-14260263
]

Zhijie Shen commented on YARN-2936:
---

bq. Maybe a simply way is to do this:

If either write() is called before getProto() or vice versa, the builder object
is set twice. Is it better to recover the override setters/getters, and
implement them properly as well as getProto? Similar to what we have done in a
PBImpl?

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

Key: YARN-2936
URL: https://issues.apache.org/jira/browse/YARN-2936
Project: Hadoop YARN
Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
Attachments: YARN-2936.001.patch, YARN-2936.002.patch,
YARN-2936.003.patch

After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier,
such that when constructing a object which extends
YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on,
when we call getProto() of it, we will just get an empty proto object.
It seems to do no harm to the production code path, as we will always call
getBytes() before using proto to persist the DT in the state store, when we
generating the password.
I think the setter is removed to avoid duplicating setting the fields why
getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work
properly alone. YARNDelegationTokenIdentifier is tightly coupled with the
logic in secretManager. It's vulnerable if something is changed at
secretManager. For example, in the test case of YARN-2837, I spent time to
figure out we need to execute getBytes() first to make sure the testing DTs
can be properly put into the state store.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice

2014-12-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260268#comment-14260268
 ] 

Zhijie Shen commented on YARN-2938:
---

+1, will commit the patch

 Fix new findbugs warnings in hadoop-yarn-resourcemanager and 
 hadoop-yarn-applicationhistoryservice
 --

 Key: YARN-2938
 URL: https://issues.apache.org/jira/browse/YARN-2938
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: FindBugs Report.html, YARN-2938.001.patch, 
 YARN-2938.002.patch, YARN-2938.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase


[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260270#comment-14260270
 ] 

Karthik Kambatla commented on YARN-2797:


FIFO and Capacity schedulers share a lot of code. Given that and the long 
duration of tests, I felt it was okay to not include FIFO. This was the main 
reason, rest of the tests don't have FIFO configs. 

In any case, we should move some of these tests to a different profile (or 
module) so it doesn't take as long to run the unit tests. Once we do that, may 
be we can just add FIFO to the list? 

 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice

2014-12-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260285#comment-14260285
 ] 

Hudson commented on YARN-2938:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6793 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6793/])
YARN-2938. Fixed new findbugs warnings in hadoop-yarn-resourcemanager and 
hadoop-yarn-applicationhistoryservice. Contributed by Varun Saxena. (zjshen: 
rev 241d3b3a50c6af92f023d8b2c24598f4813f4674)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java


 Fix new findbugs warnings in hadoop-yarn-resourcemanager and 
 hadoop-yarn-applicationhistoryservice
 --

 Key: YARN-2938
 URL: https://issues.apache.org/jira/browse/YARN-2938
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: FindBugs Report.html, YARN-2938.001.patch, 
 YARN-2938.002.patch, YARN-2938.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260321#comment-14260321
 ] 

Zhijie Shen commented on YARN-2936:
---

Or to keep simple, for
{code}
builder.setOwner(getOwner().toString());
builder.setRenewer(getRenewer().toString());
builder.setRealUser(getRealUser().toString());
builder.setIssueDate(getIssueDate());
builder.setMaxDate(getMaxDate());
builder.setSequenceNumber(getSequenceNumber());
builder.setMasterKeyId(getMasterKeyId());
{code}
Can we do something like
{code}
if (builder.getOwner() is not equal to getOwner()) {
 builder.setOwner(getOwner().toString());
}
{code}
To only set builder when the value is updated.

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260329#comment-14260329
]

Jian He commented on YARN-2936:
---

bq. Can we do something like
thanks Zhijie ! +1 for this

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover


[ 
https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260344#comment-14260344
 ] 

Jian He commented on YARN-2062:
---

Is this still happening often? given that we clean all the RMNodes in context 
on fail over.

 Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
 ---

 Key: YARN-2062
 URL: https://issues.apache.org/jira/browse/YARN-2062
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2062-1.patch


 On busy clusters, we see several 
 {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events 
 invoked against NEW nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover


[ 
https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260346#comment-14260346
 ] 

Karthik Kambatla commented on YARN-2062:


I haven't checked it recently. Did we make the change you mention after I 
reported? If yes, I ll be happy to close this as Not a problem. 

 Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
 ---

 Key: YARN-2062
 URL: https://issues.apache.org/jira/browse/YARN-2062
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2062-1.patch


 On busy clusters, we see several 
 {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events 
 invoked against NEW nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase


[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260347#comment-14260347
 ] 

Karthik Kambatla commented on YARN-2797:


By the way, TestRMRestart passes locally. 

 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover


[ 
https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260360#comment-14260360
 ] 

Jian He commented on YARN-2062:
---

that's before this reported. Do you still remember which invalid event happened 
exactly, I'm trying to understand how this happened.

 Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
 ---

 Key: YARN-2062
 URL: https://issues.apache.org/jira/browse/YARN-2062
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2062-1.patch


 On busy clusters, we see several 
 {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events 
 invoked against NEW nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-2936:
---
Attachment: YARN-2936.004.patch

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator


[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260427#comment-14260427
 ] 

Karthik Kambatla commented on YARN-2716:


We kind of need CURATOR-111 for this. Posting a patch for that. 

 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Robert Kanter

 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs


[ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260430#comment-14260430
 ] 

Varun Saxena commented on YARN-2987:


[~jianhe] / [~zjshen], kindly review

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now


[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260445#comment-14260445
 ] 

Hadoop QA commented on YARN-2936:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689373/YARN-2936.004.patch
  against trunk revision 241d3b3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6203//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6203//console

This message is automatically generated.

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs


[ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260457#comment-14260457
 ] 

Jian He commented on YARN-2987:
---

looks good overall, could you add a test case that a non-authorized user not 
able to get the application report ?

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260463#comment-14260463
]

Jian He commented on YARN-2936:
---

looks good, one nit:
{{builder.getOwner().toString()}} already returns String type, so the toString
is unnecessary, similar for getRenewer and getUser.

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI


 [ 
https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2943:
-
Attachment: YARN-2943.3.patch

The patch didn't apply on latest trunk, updated patch.

 Add a node-labels page in RM web UI
 ---

 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, 
 YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch


 Now we have node labels in the system, but there's no a very convenient to 
 get information like how many active NM(s) assigned to a given label?, how 
 much total resource for a give label?, For a given label, which queues can 
 access it?, etc.
 It will be better to add a node-labels page in RM web UI, users/admins can 
 have a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-2936:
---
Attachment: YARN-2936.005.patch

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-2936:
---
Attachment: YARN-2936.005.patch

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-2936:
---
Attachment: (was: YARN-2936.005.patch)

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260477#comment-14260477
]

Jian He commented on YARN-2936:
---

just one more thing, the newly added test is passing without the core change.
could you update the test to pass with the core change but fail without the
change ?

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler

2014-12-29 Thread Anubhav Dhoot (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260491#comment-14260491
]

Anubhav Dhoot commented on YARN-2881:
-

Hi [~subru] thanks for your review

bq. Are you assuming that parent queue names are unique in FS?
I am assuming the names are all fully qualified, both when clients refer to a
queue name while managing reservations, and during the implementation of fair
scheduler's reservation portion. This is in contrast to the
CapcacityScheduler's reservation portion.

bq. run() need not be synchronized. I know this is from previous code but it
would be good to clean it up since we are refactoring the code.
AbstractPlanFollower::plans is modified from multiple places and that seems the
only protection for it.

bq. getChildReservationQueues() could be implemented by the
AbstractSchedulerPlanFollower using Queue::getQueueInfo ?
That will only give us QueueInfos for the child queues. Rest of the code deals
in Queue (eg getPlanQueue). So I would prefer leaving this as is.

bq. I think we can add a getResourceCalculator to YarnScheduler as it makes
sense. Then we need not override calculateTargetCapacity() and
isPlanResourcesLessThanReservations().
Done.

bq. Minor: spurious white lines in imports of CapacitySchedulerPlanFollower
FairSchedulerPlanFollower.
Done

Implement PlanFollower for FairScheduler

Key: YARN-2881
URL: https://issues.apache.org/jira/browse/YARN-2881
Project: Hadoop YARN
Issue Type: Sub-task
Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Attachments: YARN-2881.001.patch, YARN-2881.prelim.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2881) Implement PlanFollower for FairScheduler

2014-12-29 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2881:

Attachment: YARN-2881.002.patch

Addressing [~subru]'s comments and FindBugs

 Implement PlanFollower for FairScheduler
 

 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, 
 YARN-2881.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now


[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260496#comment-14260496
 ] 

Hadoop QA commented on YARN-2936:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689373/YARN-2936.004.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6204//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6204//console

This message is automatically generated.

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260497#comment-14260497
 ] 

Hadoop QA commented on YARN-2943:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689380/YARN-2943.3.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6205//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6205//console

This message is automatically generated.

 Add a node-labels page in RM web UI
 ---

 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, 
 YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch


 Now we have node labels in the system, but there's no a very convenient to 
 get information like how many active NM(s) assigned to a given label?, how 
 much total resource for a give label?, For a given label, which queues can 
 access it?, etc.
 It will be better to add a node-labels page in RM web UI, users/admins can 
 have a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now


[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260502#comment-14260502
 ] 

Hadoop QA commented on YARN-2936:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689383/YARN-2936.005.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6206//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6206//console

This message is automatically generated.

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260519#comment-14260519
]

Varun Saxena commented on YARN-2936:

eclipse:eclipse failing due to some problem in Jenkins. Below message is coming.
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build@2/dev-support/test-patch.sh:
line 692:
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build@2/../patchprocess/patchEclipseOutput.txt:
No such file or directory

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now


[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260525#comment-14260525
 ] 

Varun Saxena commented on YARN-2936:


eclipse:eclipse passes in my local build

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2748) Upload logs in the sub-folders under the local log dir when aggregating logs


 [ 
https://issues.apache.org/jira/browse/YARN-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2748:
---
Attachment: YARN-2748.002.patch

 Upload logs in the sub-folders under the local log dir when aggregating logs
 

 Key: YARN-2748
 URL: https://issues.apache.org/jira/browse/YARN-2748
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2748.001.patch, YARN-2748.002.patch


 YARN-2734 has a temporal fix to skip sub folders to avoid exception. Ideally, 
 if the app is creating a sub folder and putting its rolling logs there, we 
 need to upload these logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2748) Upload logs in the sub-folders under the local log dir when aggregating logs


[ 
https://issues.apache.org/jira/browse/YARN-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260555#comment-14260555
 ] 

Varun Saxena commented on YARN-2748:


bq. To differentiate log files which may have same file names(due to 
subfolders), I think we can write file path relative to container log directory 
instead. Your views on this. 
bq. Given Log Root Dir/sub-dir1/sub-dir2/.../.log, we can use the relative 
path sub-dir1/sub-dir2/.../.log to uniquely identify a log.
[~zjshen], latest patch uses relative path to identify log in aggregated log 
file. Kindly review.

 Upload logs in the sub-folders under the local log dir when aggregating logs
 

 Key: YARN-2748
 URL: https://issues.apache.org/jira/browse/YARN-2748
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2748.001.patch, YARN-2748.002.patch


 YARN-2734 has a temporal fix to skip sub folders to avoid exception. Ideally, 
 if the app is creating a sub folder and putting its rolling logs there, we 
 need to upload these logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

[
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260568#comment-14260568
]

Jian He commented on YARN-2936:
---

bq. the newly added test is passing without the core change. could you update
the test to pass with the core change but fail without the change ?
thanks for updating. could you see if my last comment make sense ? thanks

YARNDelegationTokenIdentifier doesn't set proto.builder now
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260574#comment-14260574
 ] 

Hadoop QA commented on YARN-2881:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689386/YARN-2881.002.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6207//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6207//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6207//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6207//console

This message is automatically generated.

 Implement PlanFollower for FairScheduler
 

 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, 
 YARN-2881.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs


 [ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2987:
---
Attachment: YARN-2987.002.patch

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch, YARN-2987.002.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs


[ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260576#comment-14260576
 ] 

Varun Saxena commented on YARN-2987:


bq. could you add a test case that a non-authorized user not able to get the 
application report ?
[~jianhe], added the case. Kindly review. 

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch, YARN-2987.002.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260594#comment-14260594
 ] 

Wangda Tan commented on YARN-2933:
--

Hi [~mayank_bansal],
Overall method looks good to me, thanks for update.

Some comments about implementation details:
1) You can use {{clusterResource = 
rmContext.getNodeLabelManager().getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
 clusterResource);}} instead of get {{clusterResource = clusterResource - 
all-labeld-resource}}.
2) {{lm.getNodeLabels();}} will copy the node to labels map, so it will be 
expensive when decide to preempt every container. I suggest we can get a 
node-to-labels map *at the beginning of {{editSchedule}}*, this will presume 
node-to-labels is not changed during the preemption policy execution. But I 
think it will be reasonable since we presume queue-resource is not changed 
dring preemption policy execution as well. In addition, {{isLabeledContainer}} 
can leverage the map instead of loop every entry.

Regarding test,
I think this test covers one case, which is _do no preempt containers from NMs 
with label_. Another case I think need cover is verify ideal_allocation changed 
according to this patch. An example is:

{code}
cluster.no_label.resource = 100
cluster.label-x.resource = 100

root.A.capacity = 40
root.A.label-x.capacity = 50
root.A.no_label.used = 40
root.A.label-x.used = 50

root.B.capacity = 40
root.B.label-x.capacity = 50
root.B.no_label.used = 50 
root.B.label-x.used = 0

root.C.capacity = 20
root.C.pending = 10
root.C.used = 10

root.C should preempt 10 from B instead of from A. Even if A's total used 
resource = 90, but A's no-label used resource still because guaranteed no-label 
resource.
{code}

Does this make sense to you?


 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-29 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.20.patch

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.6.patch, 
 YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2217) Shared cache client side changes

2014-12-29 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2217:
---
Attachment: YARN-2217-trunk-v5.patch

[~kasha] V5 attached.

1. Removed isSCMAvailable logic (moving it to MR layer).
2. Surface exceptions through the api.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260598#comment-14260598
 ] 

Jian He commented on YARN-2943:
---

looks good overall , few minor comments:
- getNActiveNMs  - getNumActiveNMs
- Label class and RMNodeLabelInfo class can be consolidated into one
- Probably add a common method like addNode in Label class to update 
numActiveNMs and resource altogether. 

 Add a node-labels page in RM web UI
 ---

 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, 
 YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch


 Now we have node labels in the system, but there's no a very convenient to 
 get information like how many active NM(s) assigned to a given label?, how 
 much total resource for a give label?, For a given label, which queues can 
 access it?, etc.
 It will be better to add a node-labels page in RM web UI, users/admins can 
 have a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs


[ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260602#comment-14260602
 ] 

Jian He commented on YARN-2987:
---

looks good, +1

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch, YARN-2987.002.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI


 [ 
https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2943:
-
Attachment: YARN-2943.4.patch

Thanks comments from [~jianhe], all addressed in the new patch, please kindly 
review.

 Add a node-labels page in RM web UI
 ---

 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, 
 YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch


 Now we have node labels in the system, but there's no a very convenient to 
 get information like how many active NM(s) assigned to a given label?, how 
 much total resource for a give label?, For a given label, which queues can 
 access it?, etc.
 It will be better to add a node-labels page in RM web UI, users/admins can 
 have a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI


 [ 
https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2943:
-
Attachment: YARN-2943.5.patch

Added missing apache license to new file.

 Add a node-labels page in RM web UI
 ---

 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, 
 YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch, 
 YARN-2943.5.patch


 Now we have node labels in the system, but there's no a very convenient to 
 get information like how many active NM(s) assigned to a given label?, how 
 much total resource for a give label?, For a given label, which queues can 
 access it?, etc.
 It will be better to add a node-labels page in RM web UI, users/admins can 
 have a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs


[ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260662#comment-14260662
 ] 

Hadoop QA commented on YARN-2987:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689406/YARN-2987.002.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6209//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6209//console

This message is automatically generated.

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch, YARN-2987.002.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260665#comment-14260665
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689408/YARN-2637.20.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/6211//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6211//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6211//console

This message is automatically generated.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.6.patch, 
 YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2217) Shared cache client side changes


[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260666#comment-14260666
 ] 

Hadoop QA commented on YARN-2217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12689409/YARN-2217-trunk-v5.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
10 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/6210//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6210//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6210//console

This message is automatically generated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260701#comment-14260701
 ] 

Hadoop QA commented on YARN-2943:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689419/YARN-2943.5.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6212//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6212//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6212//console

This message is automatically generated.

 Add a node-labels page in RM web UI
 ---

 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, 
 YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch, 
 YARN-2943.5.patch


 Now we have node labels in the system, but there's no a very convenient to 
 get information like how many active NM(s) assigned to a given label?, how 
 much total resource for a give label?, For a given label, which queues can 
 access it?, etc.
 It will be better to add a node-labels page in RM web UI, users/admins can 
 have a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2477) DockerContainerExecutor must support secure mode

2014-12-29 Thread Eron Wright (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260745#comment-14260745
]

Eron Wright commented on YARN-2477:

A key question here is whether it is necessary for the container to be capable
of Kerberos authentication. Considering how tasks primarily use delegation
tokens rather than Kerberos auth, the ability might not be important.A
valid scenario might be appmasters with Kerberized endpoints.

By running in a container, the application loses access to two relevant files
on the host filesystem: a) the /etc/krb5.conf file, and b) the installed JCE
policy files (which Abin alludes to). Those files may vary by environment and
are typically managed by Ambari/Cloudera Manager. On a), one solution is for
the DockerContainerExecutor to share /etc/krb5.conf into the container.On
b), I think it acceptable to defer the JCE issue and assume that the image will
contain the needed policy. I believe that the steps to install a JCE policy
vary by Linux distribution (some use 'alternatives').

DockerContainerExecutor must support secure mode

Key: YARN-2477
URL: https://issues.apache.org/jira/browse/YARN-2477
Project: Hadoop YARN
Issue Type: New Feature
Reporter: Abin Shahab
Labels: security

DockerContainerExecutor(patch in YARN-1964) does not support Kerberized
hadoop clusters yet, as Kerberized hadoop cluster has a strict dependency on
the LinuxContainerExecutor.
For Docker containers to be used in production environment, they must support
secure hadoop. Issues regarding Java's AES encryption library in a
containerized environment also need to be worked out.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2014-12-29 Thread Yi Liu (JIRA)

Yi Liu created YARN-2996:


 Summary: Refine some fs operations in FileSystemRMStateStore to 
improve performance
 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu


In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
performance:
*1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, 
we can merge them to save one RPC call
{code}
if (fs.exists(versionNodePath)) {
FileStatus status = fs.getFileStatus(versionNodePath);
{code}

*2.*
{code}
protected void updateFile(Path outputPath, byte[] data) throws Exception {
  Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
.new);
  // use writeFile to make sure .new file is created atomically
  writeFile(newPath, data);
  replaceFile(newPath, outputPath);
}
{code}
The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
one rename operation.

Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2014-12-29 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-2996:
-
Attachment: YARN-2996.001.patch

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2997) NM keeps sending finished containers to RM until app is finished

Chengbing Liu created YARN-2997:
---

 Summary: NM keeps sending finished containers to RM until app is 
finished
 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu


We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{quote}getRMContainer{quote} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI


 [ 
https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2943:
-
Attachment: YARN-2943.6.patch

Addressed findbugs warning. Failed test seems not related to this patch.

 Add a node-labels page in RM web UI
 ---

 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, 
 YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch, 
 YARN-2943.5.patch, YARN-2943.6.patch


 Now we have node labels in the system, but there's no a very convenient to 
 get information like how many active NM(s) assigned to a given label?, how 
 much total resource for a give label?, For a given label, which queues can 
 access it?, etc.
 It will be better to add a node-labels page in RM web UI, users/admins can 
 have a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished


 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Description: 
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{getRMContainer} returns null.

  was:
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{quote}getRMContainer{quote} returns null.


 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu

 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {getRMContainer} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished


 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Description: 
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{{getRMContainer}} returns null.

  was:
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{getRMContainer} returns null.


 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu

 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished


 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Attachment: YARN-2997.patch

Report to RM only once by not calling 
{{containerStatuses.add(containerStatus);}} from the second time on.

Tested on a real cluster and it works well.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished


[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260882#comment-14260882
 ] 

Hadoop QA commented on YARN-2997:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689447/YARN-2997.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6215//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6215//console

This message is automatically generated.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance