date:20141020


[ 
https://issues.apache.org/jira/browse/YARN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176746#comment-14176746
 ] 

Hadoop QA commented on YARN-2711:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12675786/apache-yarn-2711.0.patch
  against trunk revision da80c4d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5458//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5458//console

This message is automatically generated.

> TestDefaultContainerExecutor#testContainerLaunchError fails on Windows
> --
>
> Key: YARN-2711
> URL: https://issues.apache.org/jira/browse/YARN-2711
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2711.0.patch
>
>
> The testContainerLaunchError test fails on Windows with the following error -
> {noformat}
> java.io.FileNotFoundException: File file:/bin/echo does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at org.apache.hadoop.fs.FilterFs.getFileStatus(FilterFs.java:120)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1117)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1113)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1113)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2019)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1978)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:145)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor.testContainerLaunchError(TestDefaultContainerExecutor.java:289)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2691) User level API support for priority label

2014-10-20 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2691:
-
Attachment: YARN-2691.patch

> User level API support for priority label
> -
>
> Key: YARN-2691
> URL: https://issues.apache.org/jira/browse/YARN-2691
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Sunil G
>Assignee: Rohith
> Attachments: YARN-2691.patch
>
>
> Support for handling Application-Priority label coming from client to 
> ApplicationSubmissionContext.
> Common api support for user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt


 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---
Attachment: yarn-2010-3.patch

Re-uploading the last patch, that has a single {{catch(Exception)}}.

[~vinodkv] - would you still prefer having multiple catch-blocks, one for each 
exception. IMO, catching {{ConnectException}} doesn't seem very readable; we 
could add a comment on why we are adding that catch, but we might not be able 
to enumerate all possible cases. That said, I am okay with catching 
ConnectException and Exception separately. Please advise. 

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
>  
> ... 13 more 
> {

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt


[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176985#comment-14176985
 ] 

Hadoop QA commented on YARN-2010:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675843/yarn-2010-3.patch
  against trunk revision d5084b9.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5459//console

This message is automatically generated.

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
>  
> ... 13 more 
> {noformat}



--
This mess

[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.


[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177003#comment-14177003
 ] 

Karthik Kambatla commented on YARN-2579:


[~rohithsharma] - can you help me understand the issue here better. 

{{resetDispatcher}} is called either in transitionToStandby and 
transitionToActive, both of which are synchronized methods. Under what 
conditions, can {{resetDispatcher}} be called by two threads simultaneously? 

> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2579.patch, YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes


[ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177087#comment-14177087
 ] 

Wangda Tan commented on YARN-2398:
--

[~ozawa], the log you attached will be resolved by YARN-2705, it's not as same 
as original error: 
https://issues.apache.org/jira/browse/YARN-2398?focusedCommentId=14090771&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14090771

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jason Lowe
> Attachments: TestResourceTrackerOnHA-output.txt
>
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-2710) RM HA tests failed intermittently on trunk


 [ 
https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reopened YARN-2710:
--

> RM HA tests failed intermittently on trunk
> --
>
> Key: YARN-2710
> URL: https://issues.apache.org/jira/browse/YARN-2710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Wangda Tan
> Attachments: 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt
>
>
> Failure like, it can be happened in TestApplicationClientProtocolOnHA, 
> TestResourceTrackerOnHA, etc.
> {code}
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
> testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
>   Time elapsed: 9.491 sec  <<< ERROR!
> java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 
> to asf905.gq1.ygridcore.net:28032 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1438)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>   at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583)
>   at 
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2710) RM HA tests failed intermittently on trunk


[ 
https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177095#comment-14177095
 ] 

Wangda Tan commented on YARN-2710:
--

[~jianhe], [~ozawa], [~haosd...@gmail.com]:
I just tried again, in the latest trunk, I run "mvn clean test 
-Dtest=TestApplicationClientProtocolOnHA" will success, but run 
-Dtest=TestResourceTrackerOnHA will fail. 
Attached log when running TestApplicationClientProtocolOnHA, even if it's 
succeeded, the "Cannot connection" /  "EOF error" still exists.
I guess it might be some network configuration caused issue.

And to [~ozawa], as I commented in YARN-2398, this is not as same as YARN-2398, 
reopen the ticket and people can report here if they met same problem.

Thanks,
Wangda

> RM HA tests failed intermittently on trunk
> --
>
> Key: YARN-2710
> URL: https://issues.apache.org/jira/browse/YARN-2710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Wangda Tan
> Attachments: 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt
>
>
> Failure like, it can be happened in TestApplicationClientProtocolOnHA, 
> TestResourceTrackerOnHA, etc.
> {code}
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
> testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
>   Time elapsed: 9.491 sec  <<< ERROR!
> java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 
> to asf905.gq1.ygridcore.net:28032 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1438)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>   at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583)
>   at 
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-10-20 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177099#comment-14177099
 ] 

Vinod Kumar Vavilapalli commented on YARN-2010:
---

Sorry missed this. Lost context, so please help clarify. This time, we got a 
ConnectException to Zookeeper due to which we are skipping apps? That doesn't 
sound right either.

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
>  
> ... 13 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart

2014-10-20 Thread Tsuyoshi OZAWA (JIRA)

Tsuyoshi OZAWA created YARN-2712:


 Summary: Adding tests about FSQueue and headroom of FairScheduler 
to TestWorkPreservingRMRestart
 Key: YARN-2712
 URL: https://issues.apache.org/jira/browse/YARN-2712
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA


TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about 
FairScheduler partially. We should support them.

{code}
   // Until YARN-1959 is resolved
   if (scheduler.getClass() != FairScheduler.class) {
 assertEquals(availableResources, schedulerAttempt.getHeadroom());
   }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart

2014-10-20 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2712:
-
Issue Type: Sub-task  (was: Test)
Parent: YARN-556

> Adding tests about FSQueue and headroom of FairScheduler to 
> TestWorkPreservingRMRestart
> ---
>
> Key: YARN-2712
> URL: https://issues.apache.org/jira/browse/YARN-2712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>
> TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases 
> about FairScheduler partially. We should support them.
> {code}
>// Until YARN-1959 is resolved
>if (scheduler.getClass() != FairScheduler.class) {
>  assertEquals(availableResources, schedulerAttempt.getHeadroom());
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time


 [ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2704:
--
Attachment: YARN-2704.1.patch

uploaded a patch
- make RM automatically request hdfs delegation token on behalf of the user if 
1) user doesn’t provide delegation token on app submission; Or 2) the hdfs 
delegation token is about to expire in 10 hours. 
- NMs heartBeat with RM to get the new tokens and use that for localization and 
log-aggregation
-  a config is added to disable/enable this feature.
- This approach also requires namenode to config RM as the proxy user

>  Localization and log-aggregation will fail if hdfs delegation token expired 
> after token-max-life-time
> --
>
> Key: YARN-2704
> URL: https://issues.apache.org/jira/browse/YARN-2704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2704.1.patch
>
>
> In secure mode, YARN requires the hdfs-delegation token to do localization 
> and log aggregation on behalf of the user. But the hdfs delegation token will 
> eventually expire after max-token-life-time.  So,  localization and log 
> aggregation will fail after the token expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177145#comment-14177145
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Thanks for the update, and sorry again for the late :).
The generally method looks very good to me, still reviewing tests and other 
details. One quick suggestion is, you don't need re-implement a ordered list, 
its insertion time complexity is O(n), you can use PriorityQueue of Java or 
org.apache.hadoop.utils.PriorityQueue instead.

Wangda



> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
> YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
> YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
> YARN-2056.201410132225.txt, YARN-2056.201410141330.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177147#comment-14177147
 ] 

Wangda Tan commented on YARN-2056:
--

Oh, sorry, JIRA interpreted "O\(n\)", to "O(n)", not what originally I meant :-p

> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
> YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
> YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
> YARN-2056.201410132225.txt, YARN-2056.201410141330.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt


[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177167#comment-14177167
 ] 

Karthik Kambatla commented on YARN-2010:


In this particular case, we are unable to renew HDFS delegation token due to 
ConnectException to HDFS. We are not yet clear why this happens. Even if this a 
transient HDFS issue, both RMs fail to transition to active and the individual 
RMActiveServices instances transition to STOPPED state. Any subsequent attempts 
to transition the RM to active fail because RMActiveServices is not INITED, as 
in the Standby case.

I spent some more time thinking about this, and think there might be merit to 
catch exceptions separately. ConnectException hopefully is due to a transient 
issue, I don't think we can do much in case of a permanent issue. When we run 
into this, we should probably cleanly transition to standby, so subsequent 
attempts to transition to active may succeed. 

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.j

[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster


[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177169#comment-14177169
 ] 

Wangda Tan commented on YARN-2314:
--

[~jlowe], thanks for update, patch looks good to me, +1!
[~rajesh.balamohan], thanks for your performance report based on this. 20-30ms 
is still a latency for interactive tasks cannot be totally ignored. At least, 
we have way to cache connections via configuration option in this patch.

Wangda

> ContainerManagementProtocolProxy can create thousands of threads for a large 
> cluster
> 
>
> Key: YARN-2314
> URL: https://issues.apache.org/jira/browse/YARN-2314
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2314.patch, YARN-2314v2.patch, 
> disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch, 
> tez-yarn-2314.xlsx
>
>
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
> this cache is configurable.  However the cache can grow far beyond the 
> configured size when running on a large cluster and blow AM address/container 
> limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2703) Add logUploadedTime into LogValue for better display


 [ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2703:

Attachment: YARN-2703.1.patch

> Add logUploadedTime into LogValue for better display
> 
>
> Key: YARN-2703
> URL: https://issues.apache.org/jira/browse/YARN-2703
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2703.1.patch
>
>
> Right now, the container can upload its logs multiple times. Sometimes, 
> containers write different logs into the same log file.  After the log 
> aggregation, when we query those logs, it will show:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> The same files could be displayed multiple times. But we can not figure out 
> which logs come first. We could add extra loguploadedTime to let users have 
> better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display


[ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177185#comment-14177185
 ] 

Xuan Gong commented on YARN-2703:
-

Add logUploadedTime into LogValue context for better display. This patch is 
based on YARN-2582

> Add logUploadedTime into LogValue for better display
> 
>
> Key: YARN-2703
> URL: https://issues.apache.org/jira/browse/YARN-2703
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2703.1.patch
>
>
> Right now, the container can upload its logs multiple times. Sometimes, 
> containers write different logs into the same log file.  After the log 
> aggregation, when we query those logs, it will show:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> The same files could be displayed multiple times. But we can not figure out 
> which logs come first. We could add extra loguploadedTime to let users have 
> better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2713) Broken "RM Home" link in NM Web UI when RM HA is enabled

Karthik Kambatla created YARN-2713:
--

 Summary: Broken "RM Home" link in NM Web UI when RM HA is enabled
 Key: YARN-2713
 URL: https://issues.apache.org/jira/browse/YARN-2713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It points 
to the NM-host:RM-port instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions


 [ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2209:
--
Attachment: YARN-2209.6.patch

Given that we already broken compatibility for rolling upgrades, the patch 
should be fine in that sense. Updated the patch against latest trunk.

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs


 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-102014.patch

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs


 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: (was: YARN-2673-101914.patch)

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS


[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177274#comment-14177274
 ] 

Zhijie Shen commented on YARN-2582:
---

Looks good to me overall. Just one nit

Make the following methods static?
{code}
  private void containerLogNotFound(String containerId) {
System.out.println("Logs for container " + containerId
  + " are not present in this log-file.");
  }

  private void logDirNotExist(String remoteAppLogDir) {
System.out.println(remoteAppLogDir + "does not exist.");
System.out.println("Log aggregation has not completed or is not enabled.");
  }

  private void emptyLogDir(String remoteAppLogDir) {
System.out.println(remoteAppLogDir + "does not have any log files.");
  }
{code}
Same for 
{code}
  private void createContainerLogInLocalDir(Path appLogsDir,
  ContainerId containerId, FileSystem fs) throws Exception {
{code}
and
{code}
  private void uploadContainerLogIntoRemoteDir(UserGroupInformation ugi,
  Configuration configuration, List rootLogDirs, NodeId nodeId,
  ContainerId containerId, Path appDir, FileSystem fs) throws Exception {
{code}


> Log related CLI and Web UI changes for Aggregated Logs in LRS
> -
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2582.1.patch, YARN-2582.2.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS


[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177296#comment-14177296
 ] 

Xuan Gong commented on YARN-2582:
-

Thanks for the review.

Uploaded a new patch to address all the comments

> Log related CLI and Web UI changes for Aggregated Logs in LRS
> -
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2582.1.patch, YARN-2582.2.patch, YARN-2582.3.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS


 [ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2582:

Attachment: YARN-2582.3.patch

> Log related CLI and Web UI changes for Aggregated Logs in LRS
> -
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2582.1.patch, YARN-2582.2.patch, YARN-2582.3.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time


[ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177299#comment-14177299
 ] 

Hadoop QA commented on YARN-2704:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675875/YARN-2704.1.patch
  against trunk revision d5084b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1269 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5460//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5460//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5460//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5460//console

This message is automatically generated.

>  Localization and log-aggregation will fail if hdfs delegation token expired 
> after token-max-life-time
> --
>
> Key: YARN-2704
> URL: https://issues.apache.org/jira/browse/YARN-2704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2704.1.patch
>
>
> In secure mode, YARN requires the hdfs-delegation token to do localization 
> and log aggregation on behalf of the user. But the hdfs delegation token will 
> eventually expire after max-token-life-time.  So,  localization and log 
> aggregation will fail after the token expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-10-20 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2690:

Attachment: YARN-2690.002.patch

Done. I had kept it that way to make it easier to review and was planing to 
move them in a later patch. But it belongs logically here. So updated.
Testing was done on the reservation unit tests and testReservationApis.

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2691) User level API support for priority label

2014-10-20 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177302#comment-14177302
 ] 

Sunil G commented on YARN-2691:
---

A quick nit:
ApplicationPriority can be comparable. This will help later for comparison and 
error checking.

> User level API support for priority label
> -
>
> Key: YARN-2691
> URL: https://issues.apache.org/jira/browse/YARN-2691
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Sunil G
>Assignee: Rohith
> Attachments: YARN-2691.patch
>
>
> Support for handling Application-Priority label coming from client to 
> ApplicationSubmissionContext.
> Common api support for user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2673) Add retry for timeline client put APIs


[ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177318#comment-14177318
 ] 

Hadoop QA commented on YARN-2673:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12675889/YARN-2673-102014.patch
  against trunk revision d5084b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5462//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5462//console

This message is automatically generated.

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177322#comment-14177322
 ] 

Xuan Gong commented on YARN-2701:
-

[~zxu] Thanks for the feedback.

bq. Do we need to check the directory permission?

I think we need. We need to make sure the directory has the right permission.

bq. If we want to check permission, Can we change the permission if the 
permission doesn't match?

I do not think that we need to do that. If we really want to do that, just 
changing the permission is not enough. We might need to go through all the 
sub-directories, and do some necessary checks. And it does not sound like a 
easy way to do it. I am thinking that we just keep it this way (check but no 
change the permission.). If we have further requirement, we need to spend more 
time to investigate it.




> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2673) Add retry for timeline client put APIs


[ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177326#comment-14177326
 ] 

Zhijie Shen commented on YARN-2673:
---

+1 will commit the patch

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2714) Localizer thread might stuck if NM is OOM

2014-10-20 Thread Ming Ma (JIRA)

Ming Ma created YARN-2714:
-

Summary: Localizer thread might stuck if NM is OOM
Key: YARN-2714
URL: https://issues.apache.org/jira/browse/YARN-2714
Project: Hadoop YARN
Issue Type: Bug
Reporter: Ming Ma

When NM JVM runs out of memory; normally it is uncaught exception and the
process will exit. But RPC server used by node manager catches OutOfMemoryError
to give a chance GC to catch up so NM doesn't need to exit and can recover from
OutOfMemoryError situation.

However, in some rare situation when this happens, one of the NM localizer
thread didn't get the RPC response from node manager and just waited there. The
explanation of why node manager RPC server doesn't respond is because RPC
server responder thread swallowed OutOfMemoryError and didn't process
outstanding RPC response. On the RPC client side, the RPC timeout is set to 0
and it relies on Ping to detect RPC server availability.

{noformat}
Thread 481 (LocalizerRunner for container_1413487737702_2948_01_013383):
State: WAITING
Blocked count: 27
Waited count: 84
Waiting on org.apache.hadoop.ipc.Client$Call@6be5add3
Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
org.apache.hadoop.ipc.Client.call(Client.java:1396)
org.apache.hadoop.ipc.Client.call(Client.java:1363)

org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
com.sun.proxy.$Proxy36.heartbeat(Unknown Source)

org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)

org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)

org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)

org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)

org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:995)
{noformat}

The consequence of this depends on which ContainerExecutor NM uses. If it uses
DefaultContainerExecutor, given its startLocalizer method is synchronized, it
will blocks other localizer threads. If you use LinuxContainerExecutor, at
least other localizer threads can still proceed. But in theory it can slowly
drain all available localizer threads.

There are couple ways to fix it. Some of these fixes are complementary.

1. Fix it at haoop-common layer. It seems RPC server hosted by worker services
such ad NM doesn't really need to catch OutOfMemoryError; the service JVM can
just exit. Even for the NN and RM, given we have HA, it might be ok to do so.
2. Set RPC timeout at HadoopYarnProtoRPC layer so that all YARN clients will
timeout if RPC server drops the response.
3. Fix it at yarn localization service. For example,
a) Fix DefaultContainerExecutor so that synchronization isn't required for
startLocalizer method.
b) Download executor thread used by ContainerLocalizer currently catches any
exceptions. We can fix ContainerLocalizer so that when Download executor thread
catches OutOfMemoryError, it can exit its host process.

IMHO, fix it at RPC server layer is better as it addresses other scenarios.
Appreciate any input others might have.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2673) Add retry for timeline client put APIs


[ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177357#comment-14177357
 ] 

Hudson commented on YARN-2673:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6293 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6293/])
YARN-2673. Made timeline client put APIs retry if ConnectException happens. 
Contributed by Li Lu. (zjshen: rev 89427419a3c5eaab0f73bae98d675979b9efab5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS


[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177366#comment-14177366
 ] 

Hadoop QA commented on YARN-2582:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675907/YARN-2582.3.patch
  against trunk revision d5084b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5463//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5463//console

This message is automatically generated.

> Log related CLI and Web UI changes for Aggregated Logs in LRS
> -
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2582.1.patch, YARN-2582.2.patch, YARN-2582.3.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177383#comment-14177383
 ] 

Hadoop QA commented on YARN-2209:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675884/YARN-2209.6.patch
  against trunk revision d5084b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1288 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5461//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5461//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5461//console

This message is automatically generated.

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type


[ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177387#comment-14177387
 ] 

Hadoop QA commented on YARN-2690:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675908/YARN-2690.002.patch
  against trunk revision d5084b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5464//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5464//console

This message is automatically generated.

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS


[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177392#comment-14177392
 ] 

Zhijie Shen commented on YARN-2582:
---

+1 for the last patch. Will commit it.

> Log related CLI and Web UI changes for Aggregated Logs in LRS
> -
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2582.1.patch, YARN-2582.2.patch, YARN-2582.3.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-10-20 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2690:

Attachment: YARN-2690.002.patch

Uploading again to kick jenkins. The previous failure were bind related issues, 
seemingly unrelated to this patch. Reran the tests and they passed locally.

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-10-20 Thread Abin Shahab (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-1964:
--
Attachment: YARN-1964.patch

This patch simplifies the use-case by exposing only one docker configuration 
param: The image. Now the user must configure the image completely so that all 
require resources and environment variables are defined in the image.


> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions


 [ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2209:
--
Attachment: YARN-2209.6.patch

Fixed test failures

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch, YARN-2209.6.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for Aggregated Logs in LRS


[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177464#comment-14177464
 ] 

Hudson commented on YARN-2582:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6294 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6294/])
YARN-2582. Fixed Log CLI and Web UI for showing aggregated logs of LRS. 
Contributed Xuan Gong. (zjshen: rev e90718fa5a0e7c18592af61534668acebb9db51b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java


> Log related CLI and Web UI changes for Aggregated Logs in LRS
> -
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2582.1.patch, YARN-2582.2.patch, YARN-2582.3.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

Zhijie Shen created YARN-2715:
-

 Summary: Proxy user is problem for RPC interface if 
yarn.resourcemanager.webapp.proxyuser is not set.
 Key: YARN-2715
 URL: https://issues.apache.org/jira/browse/YARN-2715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


After YARN-2656, if people set hadoop.proxyuser for the client<-->RM RPC 
interface, it's not going to work, because ProxyUsers#sip is a singleton per 
daemon. After YARN-2656, RM has both channels that want to set this 
configuration: RPC and HTTP. RPC interface sets it first by reading 
hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to empty 
because yarn.resourcemanager.webapp.proxyuser doesn't exist.

The fix for it could be similar to what we've done for YARN-2676: make the HTTP 
interface anyway source hadoop.proxyuser first, then 
yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177477#comment-14177477
 ] 

zhihai xu commented on YARN-2701:
-

Hi [~xgong], thanks for the details explanation. The explanation sounds 
reasonable to me. 

Some nits:
1. since we only check permission for the final directory component,
I think we also need check finalComponent in the first call check_permission.
change
{code}
} else if (check_permission(sb.st_mode, perm) == -1) {
{code}
to
{code}
} else if (finalComponent == 1 && check_permission(sb.st_mode, perm) == 
-1) {
{code}

2. Can we create a new function check_dir to remove the duplicate code which 
verify the existing directory at two places?
We can also remove function check_permission by moving check_permission code 
into check_dir.
This is check_dir function:
{code}
int check_dir(char* npath, mode_t st_mode, mode_t desired, int finalComponent) {
// Check whether it is a directory
if (!S_ISDIR (st_mode)) {
  fprintf(LOGFILE, "Path %s is file not dir\n", npath);
  return -1;
   } else if (finalComponent == 1) {
  int filePermInt = st_mode & (S_IRWXU | S_IRWXG | S_IRWXO);
  int desiredInt = desired & (S_IRWXU | S_IRWXG | S_IRWXO);
  if (filePermInt != desiredInt) {
  fprintf(LOGFILE, "Path %s does not have desired permission.\n", npath);
  return -1;
}
   }
   return 0;
}
{code}

3. Can we move free(npath); from create_validate_dirs to mkdirs?
It will be better to free the memory at the same function(mkdirs) which 
allocated the memory.
in mkdirs 
{code}
if (create_validate_dirs(npath, perm, path, 0) == -1) {
  free(npath);
   return -1;
 }
{code}

4. a little more optimization to remove redundant code:
we can merge these two piece of code:
fprintf(LOGFILE, "Can't create directory %s in %s - %s\n", npath,
path, strerror(errno));
by 
if (errno != EEXIST || stat(npath, &sb) != 0) {

The code after change will be like the following: 
{code}
int create_validate_dir(char* npath, mode_t perm, char* path, int 
finalComponent) {
  struct stat sb;
  if (stat(npath, &sb) != 0) {
if (mkdir(npath, perm) != 0) {
  if (errno != EEXIST  || stat(npath, &sb) != 0) {
fprintf(LOGFILE, "Can't create directory %s in %s - %s\n", npath,
path, strerror(errno));
return -1;
  }
  // The directory npath should exist.
  if (check_dir(npath, sb.st_mode, perm, finalComponent) == -1) {
  return -1;
   }
}
  } else if(check_dir(npath, sb.st_mode, perm, finalComponent) == -1){
  return -1;
  }
  return 0;
}
{code}

5. Can we change the name create_validate_dirs to create_validate_dir? since we 
only create one directory in create_validate_dirs.

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2709) Add retry for timeline client getDelegationToken method


 [ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2709:

Attachment: YARN-2709-102014.patch

I've done a patch for this issue. In this patch, I refactored the retry logic 
in jersey retry filter and built a more generalized retry wrapper for timeline 
client. Both the jersey retry filter and the delegation token call can use this 
wrapper to retry, according to the retry settings (added in YARN-2673). 

To use the retry wrapper, the user only needs to implement a 
TimelineClientRetryOp, providing a) the operation that should be retried and b) 
a verifier to tell, on given exception e, whether a retry should happen. 

I've also added a unit test for retried on getting delegation token. 

> Add retry for timeline client getDelegationToken method
> ---
>
> Key: YARN-2709
> URL: https://issues.apache.org/jira/browse/YARN-2709
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2709-102014.patch
>
>
> As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
> for secured clusters. This means if the timeline server is not available, a 
> timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177507#comment-14177507
 ] 

Hadoop QA commented on YARN-1964:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675937/YARN-1964.patch
  against trunk revision 8942741.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5466//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5466//console

This message is automatically generated.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type


[ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177527#comment-14177527
 ] 

Hadoop QA commented on YARN-2690:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675930/YARN-2690.002.patch
  against trunk revision 8942741.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5465//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5465//console

This message is automatically generated.

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server

2014-10-20 Thread chang li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2556:
---
Attachment: yarn2556_wip.patch

Thanks [~airbots] for the substantial early work! I have moved the test job 
into mapreduce jobclient tests to avoid circular dependency. I have tested the 
patch, and it has successfully shown the write time, write counters and write 
per second. I will continue to work on it to add more metric of measurement 
such as transaction rates, IO rates and memory usage.

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: chang li
> Attachments: YARN-2556-WIP.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method


[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177546#comment-14177546
 ] 

Hadoop QA commented on YARN-2709:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12675947/YARN-2709-102014.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1267 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5469//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5469//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5469//console

This message is automatically generated.

> Add retry for timeline client getDelegationToken method
> ---
>
> Key: YARN-2709
> URL: https://issues.apache.org/jira/browse/YARN-2709
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2709-102014.patch
>
>
> As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
> for secured clusters. This means if the timeline server is not available, a 
> timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2714) Localizer thread might stuck if NM is OOM


[ 
https://issues.apache.org/jira/browse/YARN-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177551#comment-14177551
 ] 

zhihai xu commented on YARN-2714:
-

YARN-2578 will address item 2, which try to fix the RPC to set timeout 1 min.
For me, 3.b will be a good low risk fix.
Also 3.a will be a good optimization.

> Localizer thread might stuck if NM is OOM
> -
>
> Key: YARN-2714
> URL: https://issues.apache.org/jira/browse/YARN-2714
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>
> When NM JVM runs out of memory; normally it is uncaught exception and the 
> process will exit. But RPC server used by node manager catches 
> OutOfMemoryError to give a chance GC to catch up so NM doesn't need to exit 
> and can recover from OutOfMemoryError situation.
> However, in some rare situation when this happens, one of the NM localizer 
> thread didn't get the RPC response from node manager and just waited there. 
> The explanation of why node manager RPC server doesn't respond is because RPC 
> server responder thread swallowed OutOfMemoryError and didn't process 
> outstanding RPC response. On the RPC client side, the RPC timeout is set to 0 
> and it relies on Ping to detect RPC server availability.
> {noformat}
> Thread 481 (LocalizerRunner for container_1413487737702_2948_01_013383):
>   State: WAITING
>   Blocked count: 27
>   Waited count: 84
>   Waiting on org.apache.hadoop.ipc.Client$Call@6be5add3
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:503)
> org.apache.hadoop.ipc.Client.call(Client.java:1396)
> org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> com.sun.proxy.$Proxy36.heartbeat(Unknown Source)
> 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
> 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
> 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)
> 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:995)
> {noformat}
> The consequence of this depends on which ContainerExecutor NM uses. If it 
> uses DefaultContainerExecutor, given its startLocalizer method is 
> synchronized, it will blocks other localizer threads. If you use 
> LinuxContainerExecutor, at least other localizer threads can still proceed. 
> But in theory it can slowly drain all available localizer threads.
> There are couple ways to fix it. Some of these fixes are complementary.
> 1. Fix it at haoop-common layer. It seems RPC server hosted by worker 
> services such ad NM doesn't really need to catch OutOfMemoryError; the 
> service JVM can just exit. Even for the NN and RM, given we have HA, it might 
> be ok to do so.
> 2. Set RPC timeout at HadoopYarnProtoRPC layer so that all YARN clients will 
> timeout if RPC server drops the response.
> 3. Fix it at yarn localization service. For example,
> a) Fix DefaultContainerExecutor so that synchronization isn't required for 
> startLocalizer method.
> b) Download executor thread used by ContainerLocalizer currently catches any 
> exceptions. We can fix ContainerLocalizer so that when Download executor 
> thread catches OutOfMemoryError, it can exit its host process.
> IMHO, fix it at RPC server layer is better as it addresses other scenarios. 
> Appreciate any input others might have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display


[ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177550#comment-14177550
 ] 

Hadoop QA commented on YARN-2703:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675881/YARN-2703.1.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5468//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5468//console

This message is automatically generated.

> Add logUploadedTime into LogValue for better display
> 
>
> Key: YARN-2703
> URL: https://issues.apache.org/jira/browse/YARN-2703
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2703.1.patch
>
>
> Right now, the container can upload its logs multiple times. Sometimes, 
> containers write different logs into the same log file.  After the log 
> aggregation, when we query those logs, it will show:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> The same files could be displayed multiple times. But we can not figure out 
> which logs come first. We could add extra loguploadedTime to let users have 
> better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177562#comment-14177562
 ] 

Xuan Gong commented on YARN-2701:
-

addressed all comments

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


 [ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2701:

Attachment: YARN-2701.4.patch

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.

2014-10-20 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177568#comment-14177568
 ] 

Vinod Kumar Vavilapalli commented on YARN-2715:
---

bq. The fix for it could be similar to what we've done for YARN-2676: make the 
HTTP interface anyway source hadoop.proxyuser first, then 
yarn.resourcemanager.webapp.proxyuser.
This is getting complex. I propose the following:
 - Have a single yarn.resourcemanager.proxyuser.* prefix
 - Change both YARN RM RPC server and webapps to use the above prefix if 
explictly configured. Otherwise, fall back to the common configs.

> Proxy user is problem for RPC interface if 
> yarn.resourcemanager.webapp.proxyuser is not set.
> 
>
> Key: YARN-2715
> URL: https://issues.apache.org/jira/browse/YARN-2715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
>
> After YARN-2656, if people set hadoop.proxyuser for the client<-->RM RPC 
> interface, it's not going to work, because ProxyUsers#sip is a singleton per 
> daemon. After YARN-2656, RM has both channels that want to set this 
> configuration: RPC and HTTP. RPC interface sets it first by reading 
> hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
> empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
> The fix for it could be similar to what we've done for YARN-2676: make the 
> HTTP interface anyway source hadoop.proxyuser first, then 
> yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177584#comment-14177584
 ] 

Hadoop QA commented on YARN-2209:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675943/YARN-2209.6.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1283 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5467//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5467//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5467//console

This message is automatically generated.

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch, YARN-2209.6.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177601#comment-14177601
 ] 

Hadoop QA commented on YARN-2701:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675954/YARN-2701.4.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5470//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5470//console

This message is automatically generated.

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2709) Add retry for timeline client getDelegationToken method


 [ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2709:

Attachment: YARN-2709-102014-1.patch

Added a tag to suppress the warnings when getting the delegation token. 

> Add retry for timeline client getDelegationToken method
> ---
>
> Key: YARN-2709
> URL: https://issues.apache.org/jira/browse/YARN-2709
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch
>
>
> As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
> for secured clusters. This means if the timeline server is not available, a 
> timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request, and node label expression only specified when resourceName=ANY


 [ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2694:
-
Attachment: YARN-2694-20141020-1.patch

Attached ver.1 patch and kick Jenkins

> Ensure only single node labels specified in resource request, and node label 
> expression only specified when resourceName=ANY
> 
>
> Key: YARN-2694
> URL: https://issues.apache.org/jira/browse/YARN-2694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2694-20141020-1.patch
>
>
> Currently, node label expression supporting in capacity scheduler is partial 
> completed. Now node label expression specified in Resource Request will only 
> respected when it specified at ANY level. And a ResourceRequest with multiple 
> node labels will make user limit computation becomes tricky.
> Now we need temporarily disable them, changes include,
> - AMRMClient
> - ApplicationMasterService



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177627#comment-14177627
 ] 

zhihai xu commented on YARN-2701:
-

thanks [~xgong], the latest patch looks most good to me, Just one typo in 
function check_dir:
"return 0;" should be outside the inner "}".
change:
{code}
return -1;
}
return 0;
  }
}
{code}
to
{code}
return -1;
}
  }
  return 0;
}
{code}

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.


[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177631#comment-14177631
 ] 

Zhijie Shen commented on YARN-2715:
---

Vinod, thanks for the comments. It makes sense to me.

> Proxy user is problem for RPC interface if 
> yarn.resourcemanager.webapp.proxyuser is not set.
> 
>
> Key: YARN-2715
> URL: https://issues.apache.org/jira/browse/YARN-2715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
>
> After YARN-2656, if people set hadoop.proxyuser for the client<-->RM RPC 
> interface, it's not going to work, because ProxyUsers#sip is a singleton per 
> daemon. After YARN-2656, RM has both channels that want to set this 
> configuration: RPC and HTTP. RPC interface sets it first by reading 
> hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
> empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
> The fix for it could be similar to what we've done for YARN-2676: make the 
> HTTP interface anyway source hadoop.proxyuser first, then 
> yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


 [ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2701:

Attachment: YARN-2701.5.patch

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177636#comment-14177636
 ] 

Xuan Gong commented on YARN-2701:
-

Good catch Fixed

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

Jian He created YARN-2716:
-

 Summary: Refactor ZKRMStateStore retry code with Apache Curator
 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He


Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to simplify 
the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2014-10-20 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-2716:
---

Assignee: Robert Kanter

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Robert Kanter
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-10-20 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177647#comment-14177647
 ] 

Subru Krishnan commented on YARN-2690:
--

Thanks [~adhoot] for updating the patch. +1 from my side.

Couple of minor nits:
  * We could have a protected _ReservationSchedulerConfiguration_ variable in 
_AbstractReservationSystem_ to avoid invoking 
_ReservationSchedulerConfiguration reservationConfig = 
getReservationSchedulerConfiguration()_ everywhere.
  * It'll be good to have some Javadocs for _ReservationSchedulerConfiguration_ 
describing what the reservation system configs are.

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177655#comment-14177655
 ] 

zhihai xu commented on YARN-2701:
-

thanks [~xgong], The latest patch LGTM.

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.


 [ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2715:
--
Attachment: YARN-2715.1.patch

Made a patch with the following changes:

1. Use "yarn.resourcemanager.proxyuser" instead of 
"yarn.resourcemanager.webapp.proxyuser" as the RM proxy user prefix for both 
RPC and HTTP channel.

2. Before setting ProxyUsers#sip, use "yarn.resourcemanager.proxyuser" to 
overwrite "hadoop.proxyuser" configurations if it exists.

3. Always read configurations with "hadoop.proxyuser" for consistency.

> Proxy user is problem for RPC interface if 
> yarn.resourcemanager.webapp.proxyuser is not set.
> 
>
> Key: YARN-2715
> URL: https://issues.apache.org/jira/browse/YARN-2715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2715.1.patch
>
>
> After YARN-2656, if people set hadoop.proxyuser for the client<-->RM RPC 
> interface, it's not going to work, because ProxyUsers#sip is a singleton per 
> daemon. After YARN-2656, RM has both channels that want to set this 
> configuration: RPC and HTTP. RPC interface sets it first by reading 
> hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
> empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
> The fix for it could be similar to what we've done for YARN-2676: make the 
> HTTP interface anyway source hadoop.proxyuser first, then 
> yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method


[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177663#comment-14177663
 ] 

Hadoop QA commented on YARN-2709:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12675965/YARN-2709-102014-1.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5472//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5472//console

This message is automatically generated.

> Add retry for timeline client getDelegationToken method
> ---
>
> Key: YARN-2709
> URL: https://issues.apache.org/jira/browse/YARN-2709
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch
>
>
> As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
> for secured clusters. This means if the timeline server is not available, a 
> timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177675#comment-14177675
 ] 

Hadoop QA commented on YARN-2701:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675971/YARN-2701.5.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5473//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5473//console

This message is automatically generated.

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2703) Add logUploadedTime into LogValue for better display


 [ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2703:

Attachment: YARN-2703.2.patch

Fix testcase failures

> Add logUploadedTime into LogValue for better display
> 
>
> Key: YARN-2703
> URL: https://issues.apache.org/jira/browse/YARN-2703
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2703.1.patch, YARN-2703.2.patch
>
>
> Right now, the container can upload its logs multiple times. Sometimes, 
> containers write different logs into the same log file.  After the log 
> aggregation, when we query those logs, it will show:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> The same files could be displayed multiple times. But we can not figure out 
> which logs come first. We could add extra loguploadedTime to let users have 
> better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2692) ktutil test hanging on some machines/ktutil versions

2014-10-20 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2692:

Hadoop Flags: Reviewed

+1 for the patch.  I agree that we're not really losing any test coverage by 
removing this.  {{TestSecureRegistry}} will make use of the same keytab file 
implicitly.

> ktutil test hanging on some machines/ktutil versions
> 
>
> Key: YARN-2692
> URL: https://issues.apache.org/jira/browse/YARN-2692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2692-001.patch
>
>
> a couple of the registry security tests run native {{ktutil}}; this is 
> primarily to debug the keytab generation. [~cnauroth] reports that some 
> versions of {{kinit}} hang. Fix: rm the tests. [YARN-2689]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request, and node label expression only specified when resourceName=ANY


[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177712#comment-14177712
 ] 

Hadoop QA commented on YARN-2694:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12675967/YARN-2694-20141020-1.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5471//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5471//console

This message is automatically generated.

> Ensure only single node labels specified in resource request, and node label 
> expression only specified when resourceName=ANY
> 
>
> Key: YARN-2694
> URL: https://issues.apache.org/jira/browse/YARN-2694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2694-20141020-1.patch
>
>
> Currently, node label expression supporting in capacity scheduler is partial 
> completed. Now node label expression specified in Resource Request will only 
> respected when it specified at ANY level. And a ResourceRequest with multiple 
> node labels will make user limit computation becomes tricky.
> Now we need temporarily disable them, changes include,
> - AMRMClient
> - ApplicationMasterService



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2194) Add Cgroup support for RedHat 7

2014-10-20 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2194:
--
Attachment: YARN-2194-1.patch

A prelim patch that implements the systemd-based cpu resource isolation for 
Redhat 7.
A summary:
(1) Create a new resource handler SystemdLCEResourceHandler. Users can use this 
handle by configuring the field 
"yarn.nodemanager.linux-container-executor.resources-handler.class".
(2) For each container, create one slice and one scope. The scope is put inside 
the slice, and cpuShare isolation is also attached to the slice.  All 
container's slices are organized in a root slice (named "hadoop_yarn.slice" in 
default).

Will add some testcases later.

> Add Cgroup support for RedHat 7
> ---
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2194-1.patch
>
>
> In previous versions of RedHat, we can build custom cgroup hierarchies with 
> use of the cgconfig command from the libcgroup package. From RedHat 7, 
> package libcgroup is deprecated and it is not recommended to use it since it 
> can easily create conflicts with the default cgroup hierarchy. The systemd is 
> provided and recommended for cgroup management. We need to add support for 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display


[ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177734#comment-14177734
 ] 

Hadoop QA commented on YARN-2703:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675982/YARN-2703.2.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5475//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5475//console

This message is automatically generated.

> Add logUploadedTime into LogValue for better display
> 
>
> Key: YARN-2703
> URL: https://issues.apache.org/jira/browse/YARN-2703
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2703.1.patch, YARN-2703.2.patch
>
>
> Right now, the container can upload its logs multiple times. Sometimes, 
> containers write different logs into the same log file.  After the log 
> aggregation, when we query those logs, it will show:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> The same files could be displayed multiple times. But we can not figure out 
> which logs come first. We could add extra loguploadedTime to let users have 
> better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


 [ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2701:

Attachment: YARN-2701.6.patch

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177750#comment-14177750
 ] 

Xuan Gong commented on YARN-2701:
-

Had some off line discussion with [~jianhan]. We think that for now, reverting 
the previous method changes might be the safest way to solve this issue.

Uploaded a new patch to do it

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.


[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177752#comment-14177752
 ] 

Hadoop QA commented on YARN-2715:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675975/YARN-2715.1.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5474//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5474//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5474//console

This message is automatically generated.

> Proxy user is problem for RPC interface if 
> yarn.resourcemanager.webapp.proxyuser is not set.
> 
>
> Key: YARN-2715
> URL: https://issues.apache.org/jira/browse/YARN-2715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2715.1.patch
>
>
> After YARN-2656, if people set hadoop.proxyuser for the client<-->RM RPC 
> interface, it's not going to work, because ProxyUsers#sip is a singleton per 
> daemon. After YARN-2656, RM has both channels that want to set this 
> configuration: RPC and HTTP. RPC interface sets it first by reading 
> hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
> empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
> The fix for it could be similar to what we've done for YARN-2676: make the 
> HTTP interface anyway source hadoop.proxyuser first, then 
> yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2161) Fix build on macosx: YARN parts


[ 
https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177753#comment-14177753
 ] 

Xuan Gong commented on YARN-2161:
-

[~decster] [~aw] 
For fixing YARN-2701, i need to revert the native code changes for mkdirs in 
container-executor.c

> Fix build on macosx: YARN parts
> ---
>
> Key: YARN-2161
> URL: https://issues.apache.org/jira/browse/YARN-2161
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Fix For: 2.6.0
>
> Attachments: YARN-2161.v1.patch, YARN-2161.v2.patch
>
>
> When compiling on macosx with -Pnative, there are several warning and errors, 
> fix this would help hadoop developers with macosx env. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2161) Fix build on macosx: YARN parts


[ 
https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177756#comment-14177756
 ] 

Xuan Gong commented on YARN-2161:
-

The changes for mkdirs in container-executor.c bring the race condition when 
two containers are trying to check and create directory at the same time.

> Fix build on macosx: YARN parts
> ---
>
> Key: YARN-2161
> URL: https://issues.apache.org/jira/browse/YARN-2161
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Fix For: 2.6.0
>
> Attachments: YARN-2161.v1.patch, YARN-2161.v2.patch
>
>
> When compiling on macosx with -Pnative, there are several warning and errors, 
> fix this would help hadoop developers with macosx env. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177761#comment-14177761
 ] 

Xuan Gong commented on YARN-2701:
-

Sorry. Online discussion with [~jianhe].  And thanks for the review. [~zxu]

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177765#comment-14177765
 ] 

Jian He commented on YARN-2701:
---

since the previous method has been used/tested thoroughly, I also prefer 
reverting the patch for solving the problem for now. thanks [~zxu] for 
reviewing the previous patch !

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.


 [ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2715:
--
Attachment: YARN-2715.2.patch

Fix the findbugs warning and the test failure

> Proxy user is problem for RPC interface if 
> yarn.resourcemanager.webapp.proxyuser is not set.
> 
>
> Key: YARN-2715
> URL: https://issues.apache.org/jira/browse/YARN-2715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2715.1.patch, YARN-2715.2.patch
>
>
> After YARN-2656, if people set hadoop.proxyuser for the client<-->RM RPC 
> interface, it's not going to work, because ProxyUsers#sip is a singleton per 
> daemon. After YARN-2656, RM has both channels that want to set this 
> configuration: RPC and HTTP. RPC interface sets it first by reading 
> hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
> empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
> The fix for it could be similar to what we've done for YARN-2676: make the 
> HTTP interface anyway source hadoop.proxyuser first, then 
> yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177799#comment-14177799
 ] 

Hadoop QA commented on YARN-2701:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675996/YARN-2701.6.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5476//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5476//console

This message is automatically generated.

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177814#comment-14177814
 ] 

Jian He commented on YARN-2701:
---

+1

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-10-20 Thread Beckham007 (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177823#comment-14177823
 ] 

Beckham007 commented on YARN-2194:
--

startSystemdSlice/stopSystemdSlice needs root privilege? Let container-executor 
to run "sudo systemctl start " ?

> Add Cgroup support for RedHat 7
> ---
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2194-1.patch
>
>
> In previous versions of RedHat, we can build custom cgroup hierarchies with 
> use of the cgconfig command from the libcgroup package. From RedHat 7, 
> package libcgroup is deprecated and it is not recommended to use it since it 
> can easily create conflicts with the default cgroup hierarchy. The systemd is 
> provided and recommended for cgroup management. We need to add support for 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177834#comment-14177834
 ] 

zhihai xu commented on YARN-2701:
-

thanks [~jianhe], The latest patch LGTM.

> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2715) Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.


[ 
https://issues.apache.org/jira/browse/YARN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177859#comment-14177859
 ] 

Hadoop QA commented on YARN-2715:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676000/YARN-2715.2.patch
  against trunk revision e90718f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5477//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5477//console

This message is automatically generated.

> Proxy user is problem for RPC interface if 
> yarn.resourcemanager.webapp.proxyuser is not set.
> 
>
> Key: YARN-2715
> URL: https://issues.apache.org/jira/browse/YARN-2715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2715.1.patch, YARN-2715.2.patch
>
>
> After YARN-2656, if people set hadoop.proxyuser for the client<-->RM RPC 
> interface, it's not going to work, because ProxyUsers#sip is a singleton per 
> daemon. After YARN-2656, RM has both channels that want to set this 
> configuration: RPC and HTTP. RPC interface sets it first by reading 
> hadoop.proxyuser, but it is overwritten by HTTP interface, who sets it to 
> empty because yarn.resourcemanager.webapp.proxyuser doesn't exist.
> The fix for it could be similar to what we've done for YARN-2676: make the 
> HTTP interface anyway source hadoop.proxyuser first, then 
> yarn.resourcemanager.webapp.proxyuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177866#comment-14177866
 ] 

Hudson commented on YARN-2701:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6297 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6297/])
YARN-2701. Potential race condition in startLocalizer when using 
LinuxContainerExecutor. Contributed by Xuan Gong (jianhe: rev 
2839365f230165222f63129979ea82ada79ec56e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
Missing file for YARN-2701 (jianhe: rev 
4fa1fb3193bf39fcb1bd7f8f8391a78f69c3c302)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockContainerLocalizer.java


> Potential race condition in startLocalizer when using LinuxContainerExecutor  
> --
>
> Key: YARN-2701
> URL: https://issues.apache.org/jira/browse/YARN-2701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
> YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch
>
>
> When using LinuxContainerExecutor do startLocalizer, we are using native code 
> container-executor.c. 
> {code}
>  if (stat(npath, &sb) != 0) {
>if (mkdir(npath, perm) != 0) {
> {code}
> We are using check and create method to create the appDir under /usercache. 
> But if there are two containers trying to do this at the same time, race 
> condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-20 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177886#comment-14177886
 ] 

Rohith commented on YARN-2579:
--

bq. Under what conditions, can resetDispatcher be called by two threads 
simultaneously? 
resetDispatcher is called only once in synchronized block(transitionToStandBy 
or transitinedToActive). 

Here the problem is , 
*Thread-1 :* just before stoppingActiveServices() from trainsitionToStandBy() 
method if RMFatalEvent is thrown then RMFatalEventDispatcher wait for 
trainsitionToStandBy() for obtaining lock.RMFatalEventDispatcher is BLOCKED on 
trainsitionToStandBy().
*Thread-2 :* From the elector, trainsitionedTotandBy() stops dispatcher in 
resetDispatcher() method. (Service)Dispatcher.stop() wait for draining out 
RMFatalEventDispatcher event.But "AsyncDispatcher event handler" is WAITING on 
dispatcher thread to finish.


> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2579.patch, YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2709) Add retry for timeline client getDelegationToken method


[ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177898#comment-14177898
 ] 

Zhijie Shen commented on YARN-2709:
---

[~gtCarrera], thanks for the patch. Here're some comments.

1. Any reason why TimelineClientRetryOp, TimelineClientConnectionRetry and 
TimelineJerseyRetryFilter are not private?

2. Redundant reference.
{code}
TimelineClientImpl.this.connectionRetry.retryOn(jerseyRetryOp);
{code}

3. Not sure why can't this code be put into run() directly? At least it 
shouldn't be public.
{code}
  public Token
getDelegationTokenInternal(final String renewer) throws IOException {
{code}

4. It's safer to create connectionRetry before retryFilter, because retryFilter 
may invoke retryFilter, though it won't actually in practice.
{code}
  TimelineJerseyRetryFilter retryFilter = new TimelineJerseyRetryFilter();
  client = new Client(new URLConnectionClientHandler(
  new TimelineURLConnectionFactory()), cc);
  token = new DelegationTokenAuthenticatedURL.Token();
  client.addFilter(retryFilter);
  connectionRetry = new TimelineClientConnectionRetry(conf);
{code}

5. Unnecessary import in TimelineClientImpl

6. I believe the following mock is not necessary. The reason why you want to 
tack this code is because of HADOOP-11215. Due to this issue, it will throw 
cast exception here. Please leave a comment about the mock code bellow.
{code}
doThrow(new ConnectException("Connection refused")).when(client)
  .getDelegationTokenInternal(any(String.class));
{code}

7. It's not meaningful renewer. You can say 
UserGroupInformation.getCurrentUser().getShortUserName() here.
{code}
  Token token = client
  .getDelegationToken("http://localhost:8/resource?delegation=";);
{code}

> Add retry for timeline client getDelegationToken method
> ---
>
> Key: YARN-2709
> URL: https://issues.apache.org/jira/browse/YARN-2709
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch
>
>
> As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
> for secured clusters. This means if the timeline server is not available, a 
> timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2709) Add retry for timeline client getDelegationToken method


 [ 
https://issues.apache.org/jira/browse/YARN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2709:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

> Add retry for timeline client getDelegationToken method
> ---
>
> Key: YARN-2709
> URL: https://issues.apache.org/jira/browse/YARN-2709
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2709-102014-1.patch, YARN-2709-102014.patch
>
>
> As mentioned in YARN-2673, we need to add retry mechanism to timeline client 
> for secured clusters. This means if the timeline server is not available, a 
> timeline client needs to retry to get a delegation token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor


[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177902#comment-14177902
 ] 

Jian He commented on YARN-1972:
---

I merged this to branch-2.6

> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
> YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
> and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
> Windows security group name that is the nodemanager service principal is a 
> member of (equivalent of LCE 
> `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
> does not require any configuration outside of the Hadoop own's yar-site.xml.
> For WCE to work the nodemanager must run as a service principal that is 
> member of the local Administrators group or LocalSystem. this is derived from 
> the need to invoke LoadUserProfile API which mention these requirements in 
> the specifications. This is in addition to the SE_TCB privilege mentioned in 
> YARN-1063, but this requirement will automatically imply that the SE_TCB 
> privilege is held by the nodemanager. For the Linux speakers in the audience, 
> the requirement is basically to run NM as root.
> h2. Dedicated high privilege Service
> Due to the high privilege required by the WCE we had discussed the need to 
> isolate the high privilege operations into a separate process, an 'executor' 
> service that is solely responsible to start the containers (incloding the 
> localizer). The NM would have to authenticate, authorize and communicate with 
> this service via an IPC mechanism and use this service to launch the 
> containers. I still believe we'll end up deploying such a service, but the 
> effort to onboard such a new platfrom specific new service on the project are 
> not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2717) containerLogNotFound log shows multiple time for the same container

Xuan Gong created YARN-2717:
---

 Summary: containerLogNotFound log shows multiple time for the same 
container
 Key: YARN-2717
 URL: https://issues.apache.org/jira/browse/YARN-2717
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Reporter: Xuan Gong
Assignee: Xuan Gong


containerLogNotFound is called multiple times when the container log for the 
same container does not exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2717) containerLogNotFound log shows multiple time for the same container


 [ 
https://issues.apache.org/jira/browse/YARN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2717:

Attachment: YARN-2717.1.patch

trivial patch

> containerLogNotFound log shows multiple time for the same container
> ---
>
> Key: YARN-2717
> URL: https://issues.apache.org/jira/browse/YARN-2717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2717.1.patch
>
>
> containerLogNotFound is called multiple times when the container log for the 
> same container does not exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over


[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177919#comment-14177919
 ] 

Hudson commented on YARN-1879:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6298 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6298/])
Missing file for YARN-1879 (jianhe: rev 
4a78a752286effbf1a0d8695325f9d7464a09fb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java


> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.23.patch, YARN-1879.24.patch, YARN-1879.25.patch, 
> YARN-1879.26.patch, YARN-1879.27.patch, YARN-1879.28.patch, 
> YARN-1879.29.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2717) containerLogNotFound log shows multiple time for the same container


[ 
https://issues.apache.org/jira/browse/YARN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177944#comment-14177944
 ] 

Hadoop QA commented on YARN-2717:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676023/YARN-2717.1.patch
  against trunk revision 4a78a75.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5478//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5478//console

This message is automatically generated.

> containerLogNotFound log shows multiple time for the same container
> ---
>
> Key: YARN-2717
> URL: https://issues.apache.org/jira/browse/YARN-2717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2717.1.patch
>
>
> containerLogNotFound is called multiple times when the container log for the 
> same container does not exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2717) containerLogNotFound log shows multiple time for the same container