[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006845#comment-14006845
 ] 

Hadoop QA commented on YARN-2088:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646030/YARN-2088.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3794//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3794//console

This message is automatically generated.

 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006858#comment-14006858
 ] 

Hadoop QA commented on YARN-2030:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645932/YARN-2030.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3793//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3793//console

This message is automatically generated.

 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1962) Timeline server is enabled by default

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006871#comment-14006871
 ] 

Hudson commented on YARN-1962:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Timeline server is enabled by default
 -

 Key: YARN-1962
 URL: https://issues.apache.org/jira/browse/YARN-1962
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Fix For: 2.4.1

 Attachments: YARN-1962.1.patch, YARN-1962.2.patch


 Since Timeline server is not matured and secured yet, enabling  it by default 
 might create some confusion.
 We were playing with 2.4.0 and found a lot of exceptions for distributed 
 shell example related to connection refused error. Btw, we didn't run TS 
 because it is not secured yet.
 Although it is possible to explicitly turn it off through yarn-site config. 
 In my opinion,  this extra change for this new service is not worthy at this 
 point,.  
 This JIRA is to turn it off by default.
 If there is an agreement, i can put a simple patch about this.
 {noformat}
 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
 from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
 Caused by: java.net.ConnectException: Connection refused
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
   at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
   at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   at java.net.Socket.connect(Socket.java:579)
   at java.net.Socket.connect(Socket.java:528)
   at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
   at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR 
 impl.TimelineClientImpl: Failed to get the response from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
  

[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006870#comment-14006870
 ] 

Hudson commented on YARN-2017:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596753)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 

[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006866#comment-14006866
 ] 

Hudson commented on YARN-2081:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 TestDistributedShell fails after YARN-1962
 --

 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 3.0.0, 2.4.1
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2081.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at org.junit.Assert.assertEquals(Assert.java:542)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006874#comment-14006874
 ] 

Hudson commented on YARN-1938:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie 
Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596710)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006867#comment-14006867
 ] 

Hudson commented on YARN-2050:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596310)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


 Fix LogCLIHelpers to create the correct FileContext
 ---

 Key: YARN-2050
 URL: https://issues.apache.org/jira/browse/YARN-2050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2050-2.patch, YARN-2050.patch


 LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
 the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006869#comment-14006869
 ] 

Hudson commented on YARN-2089:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are 
missing audience annotations. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596765)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


 FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
 audience annotations
 ---

 Key: YARN-2089
 URL: https://issues.apache.org/jira/browse/YARN-2089
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: zhihai xu
  Labels: newbie
 Fix For: 2.5.0

 Attachments: yarn-2089.patch


 We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
 annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1962) Timeline server is enabled by default

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006893#comment-14006893
 ] 

Hudson commented on YARN-1962:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Timeline server is enabled by default
 -

 Key: YARN-1962
 URL: https://issues.apache.org/jira/browse/YARN-1962
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Fix For: 2.4.1

 Attachments: YARN-1962.1.patch, YARN-1962.2.patch


 Since Timeline server is not matured and secured yet, enabling  it by default 
 might create some confusion.
 We were playing with 2.4.0 and found a lot of exceptions for distributed 
 shell example related to connection refused error. Btw, we didn't run TS 
 because it is not secured yet.
 Although it is possible to explicitly turn it off through yarn-site config. 
 In my opinion,  this extra change for this new service is not worthy at this 
 point,.  
 This JIRA is to turn it off by default.
 If there is an agreement, i can put a simple patch about this.
 {noformat}
 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
 from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
 Caused by: java.net.ConnectException: Connection refused
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
   at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
   at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   at java.net.Socket.connect(Socket.java:579)
   at java.net.Socket.connect(Socket.java:528)
   at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
   at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR 
 impl.TimelineClientImpl: Failed to get the response from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
   at 
 

[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006891#comment-14006891
 ] 

Hudson commented on YARN-2089:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are 
missing audience annotations. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596765)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


 FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
 audience annotations
 ---

 Key: YARN-2089
 URL: https://issues.apache.org/jira/browse/YARN-2089
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: zhihai xu
  Labels: newbie
 Fix For: 2.5.0

 Attachments: yarn-2089.patch


 We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
 annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006888#comment-14006888
 ] 

Hudson commented on YARN-2081:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 TestDistributedShell fails after YARN-1962
 --

 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 3.0.0, 2.4.1
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2081.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at org.junit.Assert.assertEquals(Assert.java:542)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006889#comment-14006889
 ] 

Hudson commented on YARN-2050:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596310)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


 Fix LogCLIHelpers to create the correct FileContext
 ---

 Key: YARN-2050
 URL: https://issues.apache.org/jira/browse/YARN-2050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2050-2.patch, YARN-2050.patch


 LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
 the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006896#comment-14006896
 ] 

Hudson commented on YARN-1938:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie 
Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596710)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006892#comment-14006892
 ] 

Hudson commented on YARN-2017:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596753)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006908#comment-14006908
 ] 

Sunil G commented on YARN-1408:
---

bq. we may change it to decrement the resource request only when the container 
is pulled by the AM ?

As [~jianhe] mentioned, this can create problem with subsequent NM heartbeats. 
Also I agree that the container in ALLOCATED state is the best place to do 
preemption, but this raise condition can come there.
CapacityScheduler raises KILL event for RMContainer(for preemption). So a 
solution may be like recreate resource request back, if the RMContainer state 
is ALLOCATED/ACQUIRED here. 

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2096) testQueueMetricsOnRMRestart has race condition

2014-05-23 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2096:
---

 Summary: testQueueMetricsOnRMRestart has race condition
 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
 fails randomly because of a race condition.
The test validates that metrics are incremented, but does not wait for all 
transitions to finish before checking for the values.
It also resets metrics after kicking off recovery of second RM. The metrics 
that need to be incremented race with this reset causing test to fail randomly.
We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2096) testQueueMetricsOnRMRestart has race condition

2014-05-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-2096:
---

Assignee: Anubhav Dhoot

 testQueueMetricsOnRMRestart has race condition
 --

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2096) testQueueMetricsOnRMRestart has race condition

2014-05-23 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2096:


Attachment: YARN-2096.patch

Fixed 2 race conditions by
First one) waiting for appropriate transitions before checking metrics and
 Second one) resetting metrics before the events are triggered.

 testQueueMetricsOnRMRestart has race condition
 --

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006938#comment-14006938
 ] 

Zhijie Shen commented on YARN-1937:
---

bq. A meta comment - may be this isn't a RESTy way of rejecting requests?

The situation here is that we may not deny the whole request, but part of the 
entities may not be put. Otherwise, we can simply return a HTTP 403. However, 
in this case we have to do the customized response, don't we?

bq. We should also make this a public enum so that users know what 
system-filters exist
bq. Do we really need TimelinePutError.SYSTEM_FILTER_CONFLICT? Similarly 
injectOwnerInfo. Or is it better to simply ignore the overriding filters? Not 
sure, thinking aloud.

I intentionally don't allow user to set or modify the system filter, preventing 
them from affecting the system logic. For example, if user1 post the entity 
by setting ENTITY_OWNER = user2, the posted entity will never be accessible 
by user1.Therefore the enums don't need to be visible by users. However, in 
the documententation, we can explicitly tell users what are the reserved filter 
names by the timeline service. Users shouldn't use it.

bq. Agree with Varun about admins. You should simply start respecting 
YarnConfiguration.YARN_ADMIN_ACL. See ApplicationACLsManager for e.g and reuse 
AdminACLsManager here itself.

Sure. As I already filed a ticket about adding admin acls. How about working on 
this issue separately?

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2059) Extend access control for admin acls

2014-05-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2059:
--

Summary: Extend access control for admin acls  (was: Extend access control 
for admin and configured user/group list)

 Extend access control for admin acls
 

 Key: YARN-2059
 URL: https://issues.apache.org/jira/browse/YARN-2059
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2059) Extend access control for admin acls

2014-05-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2059:
--

Target Version/s: 2.5.0

 Extend access control for admin acls
 

 Key: YARN-2059
 URL: https://issues.apache.org/jira/browse/YARN-2059
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-23 Thread Yi Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Tian updated YARN-2083:
--

Attachment: YARN-2083.patch

Add testcase for this issue

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-23 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007024#comment-14007024
 ] 

Rohith commented on YARN-1365:
--

Hi Anubhav, 
  One comment on the patch. 
* Notifying to scheduler for APP_ATTEMPT_ADDED is in RMApp lead to 
InvalidStateTranstion exception for RMAppAttept. Can this handle at 
RMAppAtteptImpl#AttemptRecoveredTransition?.  Since during recovery of RMApp, 
all attempt are recovered in synchronously , so RMAppAttempt state is moved to 
LAUNCHED before notifying to scheduler.
{noformat}
  // Let scheduler know about this attempt so it can allow AM to register
  boolean disableTransferState = false;
  app.handler.handle(new AppAttemptAddedSchedulerEvent(app.currentAttempt
  .getAppAttemptId(), disableTransferState));
{noformat}

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-23 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.3.patch

I updated patch with below changes.

   bq. Pending releases - AM forgets about a request to release once its made. 
We will have to reissue a release request after RM restart 
  FIXED
   bq. Blacklisting has logic in ignoreBlacklisting to ignore it if we cross a 
threshold.
  FIXED 
   bq. There a few places where the line exceeds 80 chars
  Even I have done format, it is not reducing less than 80char.
   Ex : Line 209 at RMContainerRequestor
LIne 267 at AMRMClientImpl

Apart from above fix, other changes done are 
* AMRMClient
**  AMRMClient maitaines blacklisted nodes.This will be sent back to RM resync.
**  Added test for checking functionality.

* MapReduce
** Added test applying yarn-1365 patch. To run this test, it is required to 
have patch for yarn-1365

Please review the patch

 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007148#comment-14007148
 ] 

Wangda Tan commented on YARN-1408:
--

I think container should be preempt-able when its in allocate state, as race 
condition mentioned by [~jianhe], can we add a resource request to RMContainer? 
When allocate/kill within one AM heartbeat happened, we can add the resource 
request back.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007269#comment-14007269
 ] 

Steve Loughran commented on YARN-2092:
--

I think this should be a wontfix, as the underlying problem was trying to get 
an older version of Jackson onto the classpath. Admittedly, this probably 
worked on 2.2-2.4, but that is because the code was pushing up the same version 
of jackson that was there. If Tez wasn't trying to push up any jackson JARs, 
but instead take what was there, it would work (which is essentially what has 
been done).

Unless/Until we can isolate YARN apps from the classpath other than 
org.apache.hadoop.*, this problem will arise. Which implies we need OSGI 
support in YARN

 Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
 2.5.0-SNAPSHOT
 

 Key: YARN-2092
 URL: https://issues.apache.org/jira/browse/YARN-2092
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 Came across this when trying to integrate with the timeline server. Using a 
 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
 jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007281#comment-14007281
 ] 

Sangjin Lee commented on YARN-2092:
---

+1 on exploring the OSGi option.

Until/unless it happens, one option might be to consider an expanded use for 
the mapreduce.job.classloader config? Currently it only works within a MR app 
(AM and tasks). However, one could argue that the functionality it provides is 
a generic one in that it creates a somewhat isolated class space. Perhaps we 
could rename the job classloader to something like an app classloader, and make 
it available for any place wherever user code needs to run in isolation.

 Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
 2.5.0-SNAPSHOT
 

 Key: YARN-2092
 URL: https://issues.apache.org/jira/browse/YARN-2092
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 Came across this when trying to integrate with the timeline server. Using a 
 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
 jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007284#comment-14007284
 ] 

Sangjin Lee commented on YARN-2092:
---

... user code or any code that needs to run in isolation.

 Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
 2.5.0-SNAPSHOT
 

 Key: YARN-2092
 URL: https://issues.apache.org/jira/browse/YARN-2092
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 Came across this when trying to integrate with the timeline server. Using a 
 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
 jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007291#comment-14007291
 ] 

Hitesh Shah commented on YARN-2092:
---

Isolating apps is just one aspect. The bigger issue is the provisioning of 
thinner client-api jars so that the hadoop internals and their dependencies do 
not need be pulled into the classpath of an app.


 Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
 2.5.0-SNAPSHOT
 

 Key: YARN-2092
 URL: https://issues.apache.org/jira/browse/YARN-2092
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 Came across this when trying to integrate with the timeline server. Using a 
 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
 jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-23 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2096:
---

Summary: Race in TestRMRestart#testQueueMetricsOnRMRestart  (was: 
testQueueMetricsOnRMRestart has race condition)

 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-23 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-596:
-

Attachment: YARN-596.patch

Update a new patch to fix Sandy's suggestions.

 In fair scheduler, intra-application container priorities affect 
 inter-application preemption decisions
 ---

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007319#comment-14007319
 ] 

Tsuyoshi OZAWA commented on YARN-2096:
--

Thank you for taking this JIRA, Anubhav. I also faced this problem when 
reviewing YARN-1365. I'll try to run the tests again and again with your patch.

 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-23 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007338#comment-14007338
 ] 

Karthik Kambatla commented on YARN-2096:


Looks good to me. +1. 

 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-23 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007338#comment-14007338
 ] 

Karthik Kambatla edited comment on YARN-2096 at 5/23/14 4:41 PM:
-

Looks good to me. +1 pending Jenkins.


was (Author: kkambatl):
Looks good to me. +1. 

 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007358#comment-14007358
 ] 

Tsuyoshi OZAWA commented on YARN-1365:
--

Hi [~rohithsharma], can you clarify us the case InvalidStateTranstion exception 
is caused? IIUC, the recovery path is as follows:
1. RMAppManager#recoverApplication() is invoked.
2. Handling RMAppEvent(appId, RMAppEventType.RECOVER) and 
RMAppRecoveredTransition() is invoked.
3. Handling AppAttemptAddedSchedulerEvent() and APP_ATTEMPT_ADDED is handled.

I thought this path works well and the test case included in a patch covers it. 
Please correct me if I'm wrong. Thanks. 

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007363#comment-14007363
 ] 

Tsuyoshi OZAWA commented on YARN-2096:
--

The change looks good to me too(non-binding).

 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007385#comment-14007385
 ] 

Vinod Kumar Vavilapalli commented on YARN-2049:
---

+1, looks good. Checking this in.

 Delegation token stuff for the timeline sever
 -

 Key: YARN-2049
 URL: https://issues.apache.org/jira/browse/YARN-2049
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
 YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-23 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007397#comment-14007397
 ] 

Varun Vasudev commented on YARN-2049:
-

+1, patch look good.

 Delegation token stuff for the timeline sever
 -

 Key: YARN-2049
 URL: https://issues.apache.org/jira/browse/YARN-2049
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
 YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007414#comment-14007414
 ] 

Varun Vasudev commented on YARN-1937:
-

+1 on the latest patch.

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007419#comment-14007419
 ] 

Hadoop QA commented on YARN-2096:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646464/YARN-2096.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3795//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3795//console

This message is automatically generated.

 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1936) Secured timeline client

2014-05-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007426#comment-14007426
 ] 

Vinod Kumar Vavilapalli commented on YARN-1936:
---

+1, this looks good. Will check this in if Jenkins says okay.

 Secured timeline client
 ---

 Key: YARN-1936
 URL: https://issues.apache.org/jira/browse/YARN-1936
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch


 TimelineClient should be able to talk to the timeline server with kerberos 
 authentication or delegation token



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1936) Secured timeline client

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007460#comment-14007460
 ] 

Hadoop QA commented on YARN-1936:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646445/YARN-1936.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

  org.apache.hadoop.yarn.client.TestRMAdminCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3796//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3796//console

This message is automatically generated.

 Secured timeline client
 ---

 Key: YARN-1936
 URL: https://issues.apache.org/jira/browse/YARN-1936
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch


 TimelineClient should be able to talk to the timeline server with kerberos 
 authentication or delegation token



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1936) Secured timeline client

2014-05-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007472#comment-14007472
 ] 

Zhijie Shen commented on YARN-1936:
---

Again, the test failure is not related.

 Secured timeline client
 ---

 Key: YARN-1936
 URL: https://issues.apache.org/jira/browse/YARN-1936
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch


 TimelineClient should be able to talk to the timeline server with kerberos 
 authentication or delegation token



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-05-23 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007485#comment-14007485
 ] 

Ivan Mitic commented on YARN-1063:
--

I have already reviewed a version of the initial patch (YARN-1063.patch). Copy 
pasting the full list of comments for documentation purposes on the Jira. 

First round:

1. winutils.h: You have a duplicate EnablePrivilege declaration. Please remove 
the one with BOOL.
2. winutils.h: Convention in the file is to use CamelCased function names. 
Please name your functions appropriately. Also, a nit, no space between '(' and 
function args. Same comment across the board.
3. libwinutils.c#EnablePrivilege:
{code}
{

ReportErrorCode(LLookupPrivilegeValue, GetLastError());

CloseHandle(hToken);
return GetLastError();

}

{code}
You shouldn't be calling GetLastError() twice above. CloseHandle() might reset 
it to 0 or some other value. Can you change all error code paths in this 
function to first assign GetLastError() to a local variable, and then log it 
and do other things. E.g.
{code}
{

dwErrorCode = GetLastError();
ReportErrorCode(LLookupPrivilegeValue, dwErrorCode);

CloseHandle(hToken);
return dwErrorCode;

}

{code}
3. Something is wrong with this comment
{code}
// Function: assignLsaString

//

// Description:

//  fills in values of LSA_STRING struct to point to a string buffer

{code}
4. void assignLsaString( __in LSA_STRING * target, __in const char *strBuf )
Is target an __inout parameter?
5. libwinutils.c: Mixed tabs and spaces. Please use 2 space ident across the 
board.
6. libwinutils.c: authentication pacakage - typo
7. libwinutils.c: Should the constant be named MICROSOFT_KERBEROS_NAME or there 
is something else more appropriate?
8. libwinutils.c: GetNameFromLogonToken: You can assert that first 
GetTokenInformation returns false. Don't do assert(GetTokenInformation() == 
FALSE) :)
9. I don't believe that calloc() sets the last error. If the return value is 
NULL, you should assume/error-with ERROR_NOT_ENOUGH_MEMORY. Applies to all 
places.
10. libwinutils.c: LookupAccountSid: Do you need to allocate userNameSize+1 and 
domainNameSize+1 buffer sizes, or is this already accounted for?
11. task.c: You are not checking result of GetCurrentDirectory(). If you expect 
the method to always succeed, you can assert that the result  0.
12. task.c: Please keep CreateProcessAsUser and old CreateProcess codepaths 
separate. I don't think it is trivial to prove that the new code with 
CreateProcessAsUser has exactly the same sematic as the old code.
13. task.c: I don't understand the need to TerminateJobObject when 
CreateProcessAsUser failed? Why wouldn't the regular return code path exit the 
process with non zero code?
14. Consider using dwErrorCode in your functions to track status of the win 
error codes.
15. task.c: createTaskAsUser: Do you need to check if lsaHandle is valid to be 
sent to unregisterWithLsa. You can initialize to INVALID_HANDLE_VALUE
16. task.c: Task: Can you please rename size to cmdLineSize. argLen to 
crtArgLen. Btw, is size needed?
17. task.c: Task: Can you please rename ARGC_GROUPID to ARGC_JOBOBJECTNAME or 
something alike.
18. task.c: TaskUsage: Your command line format does not seem to be valid. You 
are missing jobobject and pidfile.
19. task.c: 
{code}
cmdLine = argv[ARGC_COMMAND];
if (argc  ARGC_COMMAND_ARGS) {
crtArg = ARGC_COMMAND;
insertHere = buffer;
while (crtArg  argc) {
argLen = wcslen(argv[crtArg]);
wcscat(insertHere, argv[crtArg]);
insertHere += argLen;
size -= argLen;
insertHere[0] = L' ';
insertHere += 1;
size -= 1;
insertHere[0] = 0;
++crtArg;
}
cmdLine = buffer;
}

{code}
Do you mind adding a short comment on what you're doing.
20.
bq. add PIDFILE argument to task createAsUser. Pidfile must be created by the 
task controller just before launching the process.
Can you comment on the rationale? 
21.
bq. accept arbitrary arguments to pass to the process launched. We use the 
launcher for both container and localizer and that requires variable arguments.
How this works today? Is the localizer using something else?
22. We haven't written a whole lot of unittests for winutils so far (Check 
testwinutils.java). I'll let you make the call on whether this could/should be 
unittested and if yes, what is appropriate. I will sign off anyhow.

Second round:

1. I still see GetLastError() being called twice in EnablePrivilege. Can you 
please use dwErrorCode instead of the second call.
{code}
dwErrCode = GetLastError();
ReportErrorCode(LOpenProcessToken, GetLastError());

{code}
2. You missed changing the SAL annoation in the header file
{code}
void AssignLsaString(__inout LSA_STRING * target, __in const char *strBuf)

{code}
3. 

[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-05-23 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007490#comment-14007490
 ] 

Ivan Mitic commented on YARN-1063:
--

Thanks Remus for addressing all comments in the latest patch 
(YARN-1063.4.patch). Looks good to me, +1

I will give others a couple of days to provide additional feedback and then 
commit the patch. 

 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new environment for the new logon.
 # Launch the new process in a job with the task name specified and using the 
 created logon.
 # Wait for the JOB to exit.
 h2. Future work:
 The following work was scoped out of this check in:
 * Support for non-domain users or machine that are not domain joined.
 * Support for privilege isolation by running the task launcher in a high 
 privilege service with access over an ACLed named pipe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1936) Secured timeline client

2014-05-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007534#comment-14007534
 ] 

Zhijie Shen commented on YARN-1936:
---

Also did some validation for the new client patch on my local secured cluster. 
It still works correctly.

 Secured timeline client
 ---

 Key: YARN-1936
 URL: https://issues.apache.org/jira/browse/YARN-1936
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch


 TimelineClient should be able to talk to the timeline server with kerberos 
 authentication or delegation token



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007555#comment-14007555
 ] 

Hadoop QA commented on YARN-1937:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646469/YARN-1937.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3797//console

This message is automatically generated.

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server

2014-05-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007558#comment-14007558
 ] 

Vinod Kumar Vavilapalli commented on YARN-2070:
---

Given that, should we consider dropping this patch completely?

 DistributedShell publishes unfriendly user information to the timeline server
 -

 Key: YARN-2070
 URL: https://issues.apache.org/jira/browse/YARN-2070
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
Priority: Minor
  Labels: newbie
 Attachments: YARN-2070.patch


 Bellow is the code of using the string of current user object as the user 
 value.
 {code}
 entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser()
 .toString());
 {code}
 When we use kerberos authentication, it's going to output the full name, such 
 as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for 
 searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2097) Documentation: health check return status

2014-05-23 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-2097:
--

 Summary: Documentation: health check return status
 Key: YARN-2097
 URL: https://issues.apache.org/jira/browse/YARN-2097
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer


We need to document that the output of the health check script is ignored on 
non-0 exit status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism

2014-05-23 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007562#comment-14007562
 ] 

Ming Ma commented on YARN-2082:
---

Folks, thanks for the feedbacks and other jiras; quite useful. We have been 
discussing internally how to mitigate log aggregation's impact on NN for some 
time. Here are more context and comments.

1. We discussed Vinod's suggestion of post processing before. The issue is that 
the write RPC hit on NN has already happened. Even more, post processing 
introduces more hits on NN.

2. Agree that HDFS needs to be more scalable. Some improvements have been done; 
some are still being worked on. My opinion is we should do both if possible, 
improve HDFS and improve how applications use HDFS.

3. Reducing the impact on NN in large cluster is our primary motivation, 
similar to what Jason mentioned in YARN-1440. We use the approach mentioned by 
[~jira.shegalov], [~ctrezzo] in YARN-221 to mitigate the issue.

4. YARN-1440 also suggested making it pluggable, but it seems the primary 
motivation is to make it easy for other tools to integrate with yarn logs. If 
that is the case, we have two requirements to make it make log aggregation 
pluggable, easy to integrate with other tools and reduce pressure on NN.

5. We discussed writing logs to key-value store before. At that point, we 
didn't go with that approach as it introduces yarn depend on external component 
like HBase. Based on recent discussion with [~jira.shegalov] and [~ctrezzo], it 
sounds like a reasonable approach.  a) timeline store has dependency on HBase 
and b) the size of logs is small and suitable for HBase scenario.

6. Regarding Zhijie's suggestion of using timeline store, that sounds like an 
interesting idea, if timeline store is highly available.

7. Regarding Steve' comment for the long running job support. It wasn't our 
primary goal; just want to make sure if we do end up changing log aggregation, 
the framework needs to support that scenario as well. If there is long running 
container and we rotate the logs, is there a plan to aggregate them before the 
container finishes?

 Support for alternative log aggregation mechanism
 -

 Key: YARN-2082
 URL: https://issues.apache.org/jira/browse/YARN-2082
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Ming Ma

 I will post a more detailed design later. Here is the brief summary and would 
 like to get early feedback.
 Problem Statement:
 Current implementation of log aggregation create one HDFS file for each 
 {application, nodemanager }. These files are relative small, in the range of 
 1-2 MB. In a large cluster with lots of application and many nodemanagers, it 
 ends up creating lots of small files in HDFS. This creates pressure on HDFS 
 NN on the following ways.
 1. It increases NN Memory size. It is mitigated by having history server 
 deletes old log files in HDFS.
 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN 
 RPCs such as create, getAdditionalBlock, complete, rename. When the cluster 
 is busy, such RPC hit has impact on NN performance.
 In addition, to support non-MR applications on YARN, we might need to support 
 aggregation for long running applications.
 Design choices:
 1. Don't aggregate all the logs, as in YARN-221.
 2. Create a dedicated HDFS namespace used only for log aggregation.
 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will 
 be much less.
 4. Decentralize the application level log aggregation to NMs. All logs for a 
 given application are aggregated first by a dedicated NM before it is pushed 
 to HDFS.
 5. Have NM aggregate logs on a regular basis; each of these log files will 
 have data from different applications and there needs to be some index for 
 quick lookup.
 Proposal:
 1. Make yarn log aggregation pluggable for both read and write path. Note 
 that Hadoop FileSystem provides an abstraction and we could ask alternative 
 log aggregator implement compatable FileSystem, but that seems to an overkill.
 2. Provide a log aggregation plugin that write to HBase. The scheme design 
 needs to support efficient read on a per application as well as per 
 application+container basis; in addition, it shouldn't create hotspot in a 
 cluster where certain users might create more jobs than others. For example, 
 we can use hash($user+$applicationId} + containerid as the row key.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2097) Documentation: health check return status

2014-05-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2097:
---

Component/s: nodemanager

 Documentation: health check return status
 -

 Key: YARN-2097
 URL: https://issues.apache.org/jira/browse/YARN-2097
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Allen Wittenauer
  Labels: newbie

 We need to document that the output of the health check script is ignored on 
 non-0 exit status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2075) TestRMAdminCLI consistently fail on trunk

2014-05-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2075:
--

Affects Version/s: 2.5.0
   3.0.0

 TestRMAdminCLI consistently fail on trunk
 -

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1937:
--

Attachment: YARN-1937.5.patch

Fix the conflicts

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch, YARN-1937.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007592#comment-14007592
 ] 

Hadoop QA commented on YARN-596:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646537/YARN-596.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3798//console

This message is automatically generated.

 In fair scheduler, intra-application container priorities affect 
 inter-application preemption decisions
 ---

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2090) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2014-05-23 Thread Victor Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007603#comment-14007603
 ] 

Victor Kim commented on YARN-2090:
--

I figured out when\where it happens.
If you use kerberized user impersonation when submitting MR job, we actually 
submit with user (lets say john) via yarn (or whatever kerberos principal, 
whose user also presented on the box). During the shuffle phase in public void 
messageReceived(ChannelHandlerContext ctx, MessageEvent evt) ShuffleHandler 
tries to read\write the map output file as john who ran the MR job. This is not 
going to work in case of user impersonation, when john is not presented on 
local box (e.g. he comes from ActiveDirectory), because the file from local FS 
cannot be read.

The fix is to use the JVM process owner instead of the user:
System.getProperty(user.name) in two methods: populateHeaders, sendMapOutput

{code:title=ShuffleHandler.java|borderStyle=solid}

protected void populateHeaders(ListString mapIds, String outputBaseStr,
   String user, int reduce, HttpRequest 
request, HttpResponse response,
   boolean keepAliveParam, MapString, 
MapOutputInfo mapOutputInfoMap)
throws IOException {
{
// Some code here..
String processOwner = System.getProperty(user.name);
MapOutputInfo outputInfo = getMapOutputInfo(base, mapId, reduce, processOwner);
// Some code here..
}
{code}
{code:title=ShuffleHandler.java|borderStyle=solid}
protected ChannelFuture sendMapOutput(ChannelHandlerContext ctx, Channel ch,
  String user, String mapId, int 
reduce, MapOutputInfo mapOutputInfo)
throws IOException {
// Some code here..
String processOwner = System.getProperty(user.name);
spill = SecureIOUtils.openForRandomRead(spillfile, r, processOwner, null);
// Some code here..
}
{code}

 If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
 phase
 

 Key: YARN-2090
 URL: https://issues.apache.org/jira/browse/YARN-2090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.4.0
 Environment: hadoop: 2.4.0.2.1.2.0
Reporter: Victor Kim
Priority: Critical

 I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
 Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
 ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
 principal. 
 Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
 having Kerberos principal on all boxes). Result: job successfully completed.
 Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
 Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
 ShuffleError Caused by: java.io.IOException: Exceeded 
 MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
 The use case with user impersonation used to work on earlier versions, 
 without YARN (with JTTT).
 I found similar issue with Kerberos AUTH involved here: 
 https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
 And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
 resolved, which is not the case when Kerberos Authentication is enabled.
 The exception trace from YarnChild JVM:
 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
 with too many fetch failures and insufficient progress!
 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#3
 at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
 bailing-out.
 at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
 at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
 at 
 

[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007605#comment-14007605
 ] 

Hudson commented on YARN-2081:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 TestDistributedShell fails after YARN-1962
 --

 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 3.0.0, 2.4.1
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2081.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at org.junit.Assert.assertEquals(Assert.java:542)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007607#comment-14007607
 ] 

Hudson commented on YARN-2089:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are 
missing audience annotations. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596765)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


 FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
 audience annotations
 ---

 Key: YARN-2089
 URL: https://issues.apache.org/jira/browse/YARN-2089
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot
Assignee: zhihai xu
  Labels: newbie
 Fix For: 2.5.0

 Attachments: yarn-2089.patch


 We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
 annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1936) Secured timeline client

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007612#comment-14007612
 ] 

Hudson commented on YARN-1936:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
YARN-1936. Added security support for the Timeline Client. Contributed by 
Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597153)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineDelegationTokenSecretManagerService.java


 Secured timeline client
 ---

 Key: YARN-1936
 URL: https://issues.apache.org/jira/browse/YARN-1936
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch


 TimelineClient should be able to talk to the timeline server with kerberos 
 authentication or delegation token



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1962) Timeline server is enabled by default

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007609#comment-14007609
 ] 

Hudson commented on YARN-1962:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Timeline server is enabled by default
 -

 Key: YARN-1962
 URL: https://issues.apache.org/jira/browse/YARN-1962
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Fix For: 2.4.1

 Attachments: YARN-1962.1.patch, YARN-1962.2.patch


 Since Timeline server is not matured and secured yet, enabling  it by default 
 might create some confusion.
 We were playing with 2.4.0 and found a lot of exceptions for distributed 
 shell example related to connection refused error. Btw, we didn't run TS 
 because it is not secured yet.
 Although it is possible to explicitly turn it off through yarn-site config. 
 In my opinion,  this extra change for this new service is not worthy at this 
 point,.  
 This JIRA is to turn it off by default.
 If there is an agreement, i can put a simple patch about this.
 {noformat}
 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
 from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
 Caused by: java.net.ConnectException: Connection refused
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
   at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
   at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   at java.net.Socket.connect(Socket.java:579)
   at java.net.Socket.connect(Socket.java:528)
   at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
   at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR 
 impl.TimelineClientImpl: Failed to get the response from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
   at 
 

[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007610#comment-14007610
 ] 

Hudson commented on YARN-2049:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
YARN-2049. Added delegation-token support for the Timeline Server. Contributed 
by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597130)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineDelegationTokenResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientTimelineSecurityInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineAuthenticationConsts.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenOperation.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenSelector.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.SecurityInfo
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineClientAuthenticationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineDelegationTokenSecretManagerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 

[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007608#comment-14007608
 ] 

Hudson commented on YARN-2017:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596753)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 

[jira] [Updated] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-05-23 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2026:
-

Attachment: YARN-2026-v1.txt

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt


 While using hierarchical queues in fair scheduler,there are few scenarios 
 where we have seen a leaf queue with least fair share can take majority of 
 the cluster and starve a sibling parent queue which has greater weight/fair 
 share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007613#comment-14007613
 ] 

Hudson commented on YARN-1938:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie 
Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596710)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-05-23 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007635#comment-14007635
 ] 

Ashwin Shankar commented on YARN-2026:
--

Attached patch addresses this issue by setting fair share of inactive parent 
and leaf queues(queues which have no running apps) to zero.
Patch contains unit tests to illustrate the behavior. I manually tested in 
pseudo distributed cluster and verified that fair share are distributed to only 
active queues and found that preemption behavior/fairness is much better.


 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt


 While using hierarchical queues in fair scheduler,there are few scenarios 
 where we have seen a leaf queue with least fair share can take majority of 
 the cluster and starve a sibling parent queue which has greater weight/fair 
 share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server

2014-05-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007640#comment-14007640
 ] 

Zhijie Shen commented on YARN-2070:
---

The situation here is that users are free to define their own filters, even if 
they duplicate the system info. The system info should be invisible to users, 
such that user won't see, for example, ENTITY_OWNER in the response timeline 
data. Then, if users want to show the user information, they need to add it 
somewhere in the timeline entity/event, and this is what distributed shell does.

The problem here is that the complete UGI name is used as the user in the 
distributed shell. Therefore, we will see zjshen/localhost@LOCALHOST 
(auth.KERBEROS) or zjshen (auth.SIMPLE) in the user field. The 
authentication details  is not much useful, IMO. And it will trouble user when 
he want to query the timeline entities by filtering by user.

 DistributedShell publishes unfriendly user information to the timeline server
 -

 Key: YARN-2070
 URL: https://issues.apache.org/jira/browse/YARN-2070
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
Priority: Minor
  Labels: newbie
 Attachments: YARN-2070.patch


 Bellow is the code of using the string of current user object as the user 
 value.
 {code}
 entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser()
 .toString());
 {code}
 When we use kerberos authentication, it's going to output the full name, such 
 as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for 
 searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2059) Extend access control for admin acls

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007670#comment-14007670
 ] 

Hadoop QA commented on YARN-2059:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646475/YARN-2059.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3804//console

This message is automatically generated.

 Extend access control for admin acls
 

 Key: YARN-2059
 URL: https://issues.apache.org/jira/browse/YARN-2059
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2059.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007666#comment-14007666
 ] 

Hadoop QA commented on YARN-2073:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646441/yarn-2073-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3799//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3799//console

This message is automatically generated.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, 
 yarn-2073-3.patch, yarn-2073-4.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007669#comment-14007669
 ] 

Hadoop QA commented on YARN-2083:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646479/YARN-2083.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3800//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3800//console

This message is automatically generated.

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007676#comment-14007676
 ] 

Hadoop QA commented on YARN-2088:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646030/YARN-2088.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3802//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3802//console

This message is automatically generated.

 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007685#comment-14007685
 ] 

Hadoop QA commented on YARN-1474:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646330/YARN-1474.16.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1279 javac 
compiler warnings (more than the trunk's current 1276 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 17 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3801//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3801//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3801//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3801//console

This message is automatically generated.

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0, 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
 YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
 YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
 YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, 
 YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007686#comment-14007686
 ] 

Hadoop QA commented on YARN-1937:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646580/YARN-1937.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3803//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3803//console

This message is automatically generated.

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch, YARN-1937.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2098) App priority support in Fair Scheduler

2014-05-23 Thread Ashwin Shankar (JIRA)
Ashwin Shankar created YARN-2098:


 Summary: App priority support in Fair Scheduler
 Key: YARN-2098
 URL: https://issues.apache.org/jira/browse/YARN-2098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar


Umbrella jira to track tasks needed for supporting app priorities
in fair scheduler. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007689#comment-14007689
 ] 

Hadoop QA commented on YARN-596:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646537/YARN-596.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3806//console

This message is automatically generated.

 In fair scheduler, intra-application container priorities affect 
 inter-application preemption decisions
 ---

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler

2014-05-23 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2098:
-

Description: 
This jira is for supporting app priorities in fair schduler. 
AppSchedulable hard codes priority of apps to 1,we should
change this to get priority from ApplicationSubmissionContext.

  was:
Umbrella jira to track tasks needed for supporting app priorities
in fair scheduler. 


 App priority support in Fair Scheduler
 --

 Key: YARN-2098
 URL: https://issues.apache.org/jira/browse/YARN-2098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar

 This jira is for supporting app priorities in fair schduler. 
 AppSchedulable hard codes priority of apps to 1,we should
 change this to get priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler

2014-05-23 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2098:
-

Description: 
This jira is created for supporting app priorities in fair scheduler. 
AppSchedulable hard codes priority of apps to 1,we should
change this to get priority from ApplicationSubmissionContext.

  was:
This jira is for supporting app priorities in fair scheduler. AppSchedulable 
hard codes priority of apps to 1,we should
change this to get priority from ApplicationSubmissionContext.


 App priority support in Fair Scheduler
 --

 Key: YARN-2098
 URL: https://issues.apache.org/jira/browse/YARN-2098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar

 This jira is created for supporting app priorities in fair scheduler. 
 AppSchedulable hard codes priority of apps to 1,we should
 change this to get priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler

2014-05-23 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2098:
-

Description: 
This jira is for supporting app priorities in fair scheduler. AppSchedulable 
hard codes priority of apps to 1,we should
change this to get priority from ApplicationSubmissionContext.

  was:
This jira is for supporting app priorities in fair schduler. 
AppSchedulable hard codes priority of apps to 1,we should
change this to get priority from ApplicationSubmissionContext.


 App priority support in Fair Scheduler
 --

 Key: YARN-2098
 URL: https://issues.apache.org/jira/browse/YARN-2098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar

 This jira is for supporting app priorities in fair scheduler. AppSchedulable 
 hard codes priority of apps to 1,we should
 change this to get priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-05-23 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007698#comment-14007698
 ] 

Jian Fang commented on YARN-796:


I like to add a use case to this JIRA. 

In a cloud environment, hadoop could run in heterogeneous groups of 
instances.Take Amazon EMR as an example, usually an EMR hadoop cluster runs in 
master, core, and task groups, where the task group could be spot instances 
that can go away at any time. As a result, we like to have a tag capability on 
each node. That is to say, when a node manager starts up, it will load the tags 
from the configuration file. Then, the resource manager could refine the 
scheduling results based on the tags. 

One good example is that we don't want an application master to be assigned to 
any spot instance in a task group because that instance could be taken away by 
EC2 at any time. 

If hadoop resource could support a tag capability, then we could extend the 
current scheduling algorithm to add constraints to not assign the application 
master to a task node.

We don't really need any admin capability for the tags (but still good to have) 
since the tags are static and can be specified in a configuration file, for 
example yarn-site.xml.


 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2099) Preemption in fair scheduler should consider app priorities

2014-05-23 Thread Ashwin Shankar (JIRA)
Ashwin Shankar created YARN-2099:


 Summary: Preemption in fair scheduler should consider app 
priorities
 Key: YARN-2099
 URL: https://issues.apache.org/jira/browse/YARN-2099
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.5.0
Reporter: Ashwin Shankar


Fair scheduler should take app priorities into account while
preempting containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-23 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007703#comment-14007703
 ] 

Anubhav Dhoot commented on YARN-1366:
-

Looks good overall, some minor comments below

In AMRMClientImpl, populatePendingReleaseRequests could be renamed to 
removePendingReleaseRequests as its removing them.
We can comment why we need blacklistedNodes in addition to blacklistAdditions 
and removals.
In testRMContainerOnResync there is an unused assignment to assigned. Also 
might be a good idea to rename the test to indicate what is the condition and 
the expected result, say testRMContainerResendsRequestsOnRestart?
Also it will be to good to test the pendingRelease in TestRMContainerAllocator, 
maybe add 

 

 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute

2014-05-23 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2012:
-

Attachment: YARN-2012-v3.txt

 Fair Scheduler : Default rule in queue placement policy can take a queue as 
 an optional attribute
 -

 Key: YARN-2012
 URL: https://issues.apache.org/jira/browse/YARN-2012
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt, YARN-2012-v3.txt


 Currently 'default' rule in queue placement policy,if applied,puts the app in 
 root.default queue. It would be great if we can make 'default' rule 
 optionally point to a different queue as default queue .
 This default queue can be a leaf queue or it can also be an parent queue if 
 the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007715#comment-14007715
 ] 

Vinod Kumar Vavilapalli commented on YARN-1937:
---

+1, looks good. Checking this in.

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch, YARN-1937.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007744#comment-14007744
 ] 

Hadoop QA commented on YARN-2073:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646441/yarn-2073-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3805//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3805//console

This message is automatically generated.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, 
 yarn-2073-3.patch, yarn-2073-4.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2097) Documentation: health check return status

2014-05-23 Thread Rekha Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated YARN-2097:
--

Attachment: YARN-2097.1.patch

Attached patch doc as per NodeHealthMonitorExecutor.reportStatus().
Thanks

 Documentation: health check return status
 -

 Key: YARN-2097
 URL: https://issues.apache.org/jira/browse/YARN-2097
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Allen Wittenauer
  Labels: newbie
 Attachments: YARN-2097.1.patch


 We need to document that the output of the health check script is ignored on 
 non-0 exit status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007759#comment-14007759
 ] 

Hadoop QA commented on YARN-1937:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646580/YARN-1937.5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3810//console

This message is automatically generated.

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch, YARN-1937.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh

2014-05-23 Thread Rekha Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated YARN-1427:
--

Attachment: YARN-1427.1.patch

Attached patch.Thanks

 yarn-env.cmd should have the analog comments that are in yarn-env.sh
 

 Key: YARN-1427
 URL: https://issues.apache.org/jira/browse/YARN-1427
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Zhijie Shen
  Labels: newbie, windows
 Attachments: YARN-1427.1.patch


 There're the paragraphs of about RM/NM env vars (probably AHS as well soon) 
 in yarn-env.sh. Should the windows version script provide the similar 
 comments?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007757#comment-14007757
 ] 

Hadoop QA commented on YARN-2083:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646479/YARN-2083.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3807//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3807//console

This message is automatically generated.

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-23 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007768#comment-14007768
 ] 

Anubhav Dhoot commented on YARN-1365:
-

The error is RMAppRecoveredTransition leaves it in LAUNCHED and then scheduler 
executes ATTEMPT_ADDED. I see Jian fixed it in a certain way in YARN-1368. But 
that only addresses it if its in LAUNCHED. If the state reaches RUNNING before 
that we still get the error. The option is see is we pass in a flag to 
AppAttemptAddedSchedulerEvent that tells scheduler not to issue ATTEMPT_ADDED. 
This will be set in RMAppRecoveredTransition. Lemme know what you think

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1216) Deprecate/ mark private few methods from YarnConfiguration

2014-05-23 Thread Rekha Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007766#comment-14007766
 ] 

Rekha Joshi commented on YARN-1216:
---

Needs to be closed.Existing o.a.h.yarn.webapp.util.WebAppUtils (2.4.0) covers 
this.Thanks.

 Deprecate/ mark private few methods from YarnConfiguration
 --

 Key: YARN-1216
 URL: https://issues.apache.org/jira/browse/YARN-1216
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Omkar Vinit Joshi
Priority: Minor
  Labels: newbie

 Today we have few methods in YarnConfiguration which should ideally be moved 
 to some utility class.
 [related comment | 
 https://issues.apache.org/jira/browse/YARN-1203?focusedCommentId=13771281page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13771281
  ]
 * getRMWebAppURL
 * getRMWebAppHostAndPort
 * getProxyHostAndPort



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-23 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-596:
-

Attachment: YARN-596.patch

 In fair scheduler, intra-application container priorities affect 
 inter-application preemption decisions
 ---

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007778#comment-14007778
 ] 

Hadoop QA commented on YARN-1474:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646330/YARN-1474.16.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1279 javac 
compiler warnings (more than the trunk's current 1276 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 17 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3808//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3808//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3808//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3808//console

This message is automatically generated.

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0, 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
 YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
 YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
 YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, 
 YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2095) Large MapReduce Job stops responding

2014-05-23 Thread Clay McDonald (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007779#comment-14007779
 ] 

Clay McDonald commented on YARN-2095:
-

Vinod, could you read Eric's email below. Would you agree that there should be 
a log entry from Yarn in this case?

Clay McDonald 
Cell: 202.560.4101 
Direct: 202.747.5962 


From: Eric Mizell [mailto:emiz...@hortonworks.com] 
Sent: Friday, May 23, 2014 4:18 PM
To: Clay McDonald
Subject: Re: [jira] [Created] (YARN-2095) Large MapReduce Job stops responding

Clay,

What I noticed is that your reducers were overloaded and were on the brink of 
running out of memory. The Java heaps were running at 99% and continuously 
GC’ing while the app was reading from disk. So it was trying it’s best to 
process the job with limited resources. I agree with you that it would be 
helpful if the container could put out a log message that there was GC issues 
to help with debugging.

Thanks,

Eric Mizell  Director Solution Engineering, Hortonworks
Mobile: 678-761-7623
Email: emiz...@hortonworks.com
Website: http://www.hortonworks.com/





 Large MapReduce Job stops responding
 

 Key: YARN-2095
 URL: https://issues.apache.org/jira/browse/YARN-2095
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6
Reporter: Clay McDonald
Priority: Blocker

 Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but 
 logging to container logs stop after running 33 hours. The job appears to be 
 hung. The status of the job is RUNNING. No error messages found in logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2059) Extend access control for admin acls

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007781#comment-14007781
 ] 

Hadoop QA commented on YARN-2059:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646475/YARN-2059.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3811//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3811//console

This message is automatically generated.

 Extend access control for admin acls
 

 Key: YARN-2059
 URL: https://issues.apache.org/jira/browse/YARN-2059
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2059.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2100) Refactor the code of Kerberos + DT authentication

2014-05-23 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2100:
-

 Summary: Refactor the code of Kerberos + DT authentication
 Key: YARN-2100
 URL: https://issues.apache.org/jira/browse/YARN-2100
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen


The customized Kerberos + DT authentication of the timeline server largely 
refers to that of Http FS, therefore, there're a portion of duplicate code. We 
should think about refactor the code if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-23 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007786#comment-14007786
 ] 

Sandy Ryza commented on YARN-2083:
--

I think it would be better to solve this by making sure the queue cannot 
actually exceed the maxResources limit.  I.e. to also check the constraint 
*after* picking a container that would be assigned.

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007787#comment-14007787
 ] 

Hadoop QA commented on YARN-2026:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646588/YARN-2026-v1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3809//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3809//console

This message is automatically generated.

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt


 While using hierarchical queues in fair scheduler,there are few scenarios 
 where we have seen a leaf queue with least fair share can take majority of 
 the cluster and starve a sibling parent queue which has greater weight/fair 
 share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing 

[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities

2014-05-23 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007789#comment-14007789
 ] 

Wei Yan commented on YARN-2099:
---

Hi, Ashwin.
We have discussed a little bit about the priority and preemption in YARN-596. 
The old preemption implementation is based on container's priority, and 
YARN-596 starts to consider fairshare.

Have one question here, consider all queues deploy fair share policy. By taking 
app priorities, if we want to preempt a container, which option do we deploy? 
(1) We collect all running apps from over-fairshare queues, and select the one 
with the lowest priority as the candidate, or (2) We firstly choose the queue 
which is most over its fair share, then the select the app with the lowest 
priority inside that queue as the candidate.

 Preemption in fair scheduler should consider app priorities
 ---

 Key: YARN-2099
 URL: https://issues.apache.org/jira/browse/YARN-2099
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.5.0
Reporter: Ashwin Shankar

 Fair scheduler should take app priorities into account while
 preempting containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2101) Document the system filters of the timeline entity

2014-05-23 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2101:
-

 Summary: Document the system filters of the timeline entity
 Key: YARN-2101
 URL: https://issues.apache.org/jira/browse/YARN-2101
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In Yarn-1937, to support ACLs, we have reserved a filter name for the timeline 
server to use, which should not be used by the users. We need to document the 
system filter explicitly to notify users not using it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute

2014-05-23 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007797#comment-14007797
 ] 

Sandy Ryza commented on YARN-2012:
--

+1

 Fair Scheduler : Default rule in queue placement policy can take a queue as 
 an optional attribute
 -

 Key: YARN-2012
 URL: https://issues.apache.org/jira/browse/YARN-2012
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt, YARN-2012-v3.txt


 Currently 'default' rule in queue placement policy,if applied,puts the app in 
 root.default queue. It would be great if we can make 'default' rule 
 optionally point to a different queue as default queue .
 This default queue can be a leaf queue or it can also be an parent queue if 
 the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007800#comment-14007800
 ] 

Zhijie Shen commented on YARN-1937:
---

Tested the timeline security stack so far end-to-end on my local cluster. It 
seems to work fine, authentication works as expected, and only owner can view 
his posted timeline data.

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch, YARN-1937.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2012) Fair Scheduler: allow default queue placement rule to take an arbitrary queue

2014-05-23 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2012:
-

Summary: Fair Scheduler: allow default queue placement rule to take an 
arbitrary queue  (was: Fair Scheduler : Default rule in queue placement policy 
can take a queue as an optional attribute)

 Fair Scheduler: allow default queue placement rule to take an arbitrary queue
 -

 Key: YARN-2012
 URL: https://issues.apache.org/jira/browse/YARN-2012
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt, YARN-2012-v3.txt


 Currently 'default' rule in queue placement policy,if applied,puts the app in 
 root.default queue. It would be great if we can make 'default' rule 
 optionally point to a different queue as default queue .
 This default queue can be a leaf queue or it can also be an parent queue if 
 the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >