[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792357#comment-13792357
 ] 

Chris Nauroth commented on YARN-445:


I haven't had a chance to look at this patch, but I did want to link to 
MAPREDUCE-5387.  We have discussed the possibility of using 
{{SetConsoleCtrlHandler}}/{{GenerateConsoleCtrlEvent}} to approximate SIGTERM 
on Windows.  (The current task termination logic on Windows is more like a 
SIGKILL.)  Perhaps this patch could be a foundation for that.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration "yarn.nodemanager.aux-services" should have default value for mapreduce_shuffle.

2013-10-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792310#comment-13792310
 ] 

Junping Du commented on YARN-1289:
--

I think the unit test failure is because other services are unnecessary loading 
ShuffleHandler after this change. May be the right way is to change serviceInit 
in NodeManager to set default property there?

> Configuration "yarn.nodemanager.aux-services" should have default value for 
> mapreduce_shuffle.
> --
>
> Key: YARN-1289
> URL: https://issues.apache.org/jira/browse/YARN-1289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: wenwupeng
>Assignee: Junping Du
> Attachments: YARN-1289.patch
>
>
> Failed to run benchmark when not configure yarn.nodemanager.aux-services 
> value in yarn-site.xml', it is better to configure default value.
> 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
> attempt_1381371516570_0001_m_00_1, Status : FAILED
> Container launch failed for container_1381371516570_0001_01_05 : 
> org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
> auxService:mapreduce_shuffle does not exist
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper

2013-10-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792303#comment-13792303
 ] 

Ted Yu commented on YARN-1296:
--

I found these two fair-scheduler-allocation.xml :

./hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
./hadoop-tools/hadoop-sls/src/test/resources/fair-scheduler-allocation.xml

But they seem to have '' as top-level element.

> schedulerAllocateTimer is accessed without holding samplerLock in 
> ResourceSchedulerWrapper
> --
>
> Key: YARN-1296
> URL: https://issues.apache.org/jira/browse/YARN-1296
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: yarn-1296-v1.patch
>
>
> Here is related code:
> {code}
>   public Allocation allocate(ApplicationAttemptId attemptId,
>  List resourceRequests,
>  List containerIds,
>  List strings, List strings2) {
> if (metricsON) {
>   final Timer.Context context = schedulerAllocateTimer.time();
> {code}
> samplerLock should be used to guard the access.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration "yarn.nodemanager.aux-services" should have default value for mapreduce_shuffle.

2013-10-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792298#comment-13792298
 ] 

Junping Du commented on YARN-1289:
--

The patch do fix the problem that I can deploy a cluster and run job 
successfully without specifying "yarn.nodemanager.aux-services" value now. Will 
take a look at unit test failures here.

> Configuration "yarn.nodemanager.aux-services" should have default value for 
> mapreduce_shuffle.
> --
>
> Key: YARN-1289
> URL: https://issues.apache.org/jira/browse/YARN-1289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: wenwupeng
>Assignee: Junping Du
> Attachments: YARN-1289.patch
>
>
> Failed to run benchmark when not configure yarn.nodemanager.aux-services 
> value in yarn-site.xml', it is better to configure default value.
> 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
> attempt_1381371516570_0001_m_00_1, Status : FAILED
> Container launch failed for container_1381371516570_0001_01_05 : 
> org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
> auxService:mapreduce_shuffle does not exist
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper

2013-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792297#comment-13792297
 ] 

Hadoop QA commented on YARN-1296:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607939/yarn-1296-v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls:

  org.apache.hadoop.yarn.sls.TestSLSRunner

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2165//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2165//console

This message is automatically generated.

> schedulerAllocateTimer is accessed without holding samplerLock in 
> ResourceSchedulerWrapper
> --
>
> Key: YARN-1296
> URL: https://issues.apache.org/jira/browse/YARN-1296
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: yarn-1296-v1.patch
>
>
> Here is related code:
> {code}
>   public Allocation allocate(ApplicationAttemptId attemptId,
>  List resourceRequests,
>  List containerIds,
>  List strings, List strings2) {
> if (metricsON) {
>   final Timer.Context context = schedulerAllocateTimer.time();
> {code}
> samplerLock should be used to guard the access.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper

2013-10-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-1296:
-

Attachment: yarn-1296-v1.patch

> schedulerAllocateTimer is accessed without holding samplerLock in 
> ResourceSchedulerWrapper
> --
>
> Key: YARN-1296
> URL: https://issues.apache.org/jira/browse/YARN-1296
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: yarn-1296-v1.patch
>
>
> Here is related code:
> {code}
>   public Allocation allocate(ApplicationAttemptId attemptId,
>  List resourceRequests,
>  List containerIds,
>  List strings, List strings2) {
> if (metricsON) {
>   final Timer.Context context = schedulerAllocateTimer.time();
> {code}
> samplerLock should be used to guard the access.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper

2013-10-10 Thread Ted Yu (JIRA)
Ted Yu created YARN-1296:


 Summary: schedulerAllocateTimer is accessed without holding 
samplerLock in ResourceSchedulerWrapper
 Key: YARN-1296
 URL: https://issues.apache.org/jira/browse/YARN-1296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
  public Allocation allocate(ApplicationAttemptId attemptId,
 List resourceRequests,
 List containerIds,
 List strings, List strings2) {
if (metricsON) {
  final Timer.Context context = schedulerAllocateTimer.time();
{code}
samplerLock should be used to guard the access.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792238#comment-13792238
 ] 

Bikas Saha commented on YARN-1068:
--

This should probably creating a new conf and override it instead of changing 
things in the original conf.
{code}
+  YarnConfiguration conf = (YarnConfiguration) getConf();
+  conf.set(YarnConfiguration.RM_HA_ID, rmId);
+  return new RMHAServiceTarget(conf);
{code}

Shoudlnt it be "transitionToActive"?
{code}
+  RMAuditLogger.logFailure(user.getShortUserName(), "transitionToStandby",
+  adminAcl.toString(), "RMHAProtocolService",
+  "Exception transitioning to active");
{code}

We shouldnt be wrapping some unknown exception into an AccessControlException
{code}
+  private UserGroupInformation checkAccess(String method) throws 
AccessControlException {
+try {
+  return RMServerUtils.verifyAccess(adminAcl, method, LOG);
+} catch (YarnException e) {
+  throw new AccessControlException(e);
+}
{code}

This method isnt even throwing an accesscontrolexception. Then why are 
transitionToStandby() etc changing signature to throw AccessControlException.
{code}
+  public static UserGroupInformation verifyAccess(
+  AccessControlList acl, String method, final Log LOG)
+  throws YarnException {
{code}

New name doesnt seem to follow convention based on other names in that file. 
YARN_SECURITY_SERVICE_AUTHORIZATION_FOO
{code}
 new Service(
 
YarnConfiguration.YARN_SECURITY_SERVICE_AUTHORIZATION_CONTAINER_MANAGEMENT_PROTOCOL,
 
 ContainerManagementProtocolPB.class),
+new Service(
+CommonConfigurationKeys.SECURITY_HA_SERVICE_PROTOCOL_ACL,
+HAServiceProtocol.class),
{code}

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-10.patch, yarn-1068-1.patch, 
> yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, 
> yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, 
> yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792227#comment-13792227
 ] 

Tsuyoshi OZAWA commented on YARN-1293:
--

Thanks for your review!

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
> Attachments: YARN-1293.1.patch
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1293:
-

Hadoop Flags: Reviewed

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
> Attachments: YARN-1293.1.patch
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792221#comment-13792221
 ] 

Jian He commented on YARN-1293:
---

patch looks good, thanks for the fix!

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
> Attachments: YARN-1293.1.patch
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792209#comment-13792209
 ] 

Hadoop QA commented on YARN-1293:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607927/YARN-1293.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2164//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2164//console

This message is automatically generated.

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
> Attachments: YARN-1293.1.patch
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792203#comment-13792203
 ] 

Tsuyoshi OZAWA commented on YARN-1172:
--

Thank you for your comment, Karthik. I'm trying to implement this change only 
of YARN-related *SecretManagers for now, because there are some HDFS-related 
*SecretManagers which extends org.apache.hadoop.security.token.SecretManager.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors

2013-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792191#comment-13792191
 ] 

Hadoop QA commented on YARN-1295:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607925/YARN-1295.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2163//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2163//console

This message is automatically generated.

> In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" 
> errors
> -
>
> Key: YARN-1295
> URL: https://issues.apache.org/jira/browse/YARN-1295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1295.patch
>
>
> I missed this when working on YARN-1271.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore

2013-10-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792190#comment-13792190
 ] 

Jian He commented on YARN-1058:
---

YARN-1116  fixed the AMRMToken part , MAPREDUCE-5476 fixed the staging dir part

> Recovery issues on RM Restart with FileSystemRMStateStore
> -
>
> Key: YARN-1058
> URL: https://issues.apache.org/jira/browse/YARN-1058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> App recovery doesn't work as expected using FileSystemRMStateStore.
> Steps to reproduce:
> - Ran sleep job with a single map and sleep time of 2 mins
> - Restarted RM while the map task is still running
> - The first attempt fails with the following error
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Password not found for ApplicationAttempt 
> appattempt_1376294441253_0001_01
>   at org.apache.hadoop.ipc.Client.call(Client.java:1404)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1357)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at $Proxy28.finishApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91)
> {noformat}
> - The second attempt fails with a different error:
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist:
>  File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have 
> any open files.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792187#comment-13792187
 ] 

Tsuyoshi OZAWA commented on YARN-1293:
--

Closing this problems itself is no problem.
The essential problem is there are no document in hadoop project about locale. 
IMHO, we should document it instead of fixing this problem. The document as 
follows are candidates to fix.

1. http://wiki.apache.org/hadoop/HowToContribute
2. BUILDING.txt.

What do you think?

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
> Attachments: YARN-1293.1.patch
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792183#comment-13792183
 ] 

Karthik Kambatla commented on YARN-1172:


When filing the JIRA, I was thinking only of YARN-related *SecretManagers. I 
haven't looked into the mechanics of doing that, it might require 
org.apache.hadoop.security.token.SecretManager to be an AbstractService. If 
that is the case, it might be better to open a separate Common JIRA for that 
change alone.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792177#comment-13792177
 ] 

Jian He commented on YARN-1293:
---

bq. I found that this problem is caused when the system locale is not English.
Ahh, can you please close it ? thanks 

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
> Attachments: YARN-1293.1.patch
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore

2013-10-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792179#comment-13792179
 ] 

Karthik Kambatla commented on YARN-1058:


I have also noticed that this was fixed in my testing of RM HA, but I haven't 
figured out what change has fixed this. [~jianhe], any idea which JIRA might 
have fixed this? 

> Recovery issues on RM Restart with FileSystemRMStateStore
> -
>
> Key: YARN-1058
> URL: https://issues.apache.org/jira/browse/YARN-1058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> App recovery doesn't work as expected using FileSystemRMStateStore.
> Steps to reproduce:
> - Ran sleep job with a single map and sleep time of 2 mins
> - Restarted RM while the map task is still running
> - The first attempt fails with the following error
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Password not found for ApplicationAttempt 
> appattempt_1376294441253_0001_01
>   at org.apache.hadoop.ipc.Client.call(Client.java:1404)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1357)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at $Proxy28.finishApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91)
> {noformat}
> - The second attempt fails with a different error:
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist:
>  File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have 
> any open files.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1293:
-

Attachment: YARN-1293.1.patch

Fix to set LANG as C.

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
> Attachments: YARN-1293.1.patch
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-1293:


Assignee: Tsuyoshi OZAWA

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792172#comment-13792172
 ] 

Tsuyoshi OZAWA commented on YARN-1293:
--

Hi Jian, LANG in my environment is ja_JP.UTF-8.

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792169#comment-13792169
 ] 

Jian He commented on YARN-1293:
---

Hi, [~ozawa], did not reproduce this locally, what is the environment you are 
running ?

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors

2013-10-10 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1295:
-

Attachment: YARN-1295.patch

> In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" 
> errors
> -
>
> Key: YARN-1295
> URL: https://issues.apache.org/jira/browse/YARN-1295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1295.patch
>
>
> I missed this when working on YARN-1271.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors

2013-10-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792161#comment-13792161
 ] 

Sandy Ryza commented on YARN-1295:
--

Grepped through the code for "-c" and didn't find anywhere else that needs this 
change.

> In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" 
> errors
> -
>
> Key: YARN-1295
> URL: https://issues.apache.org/jira/browse/YARN-1295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> I missed this when working on YARN-1271.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1271) "Text file busy" errors launching containers again

2013-10-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792158#comment-13792158
 ] 

Sandy Ryza commented on YARN-1271:
--

These errors are still coming up for me after the patch.  I took another look 
and apparently I had looked at UnixShellScriptBuilder, but missed 
UnixLocalWrapperScriptBuilder, which also uses the "-c".  Filed YARN-1295 for 
this.  Sorry for all the noise.

> "Text file busy" errors launching containers again
> --
>
> Key: YARN-1271
> URL: https://issues.apache.org/jira/browse/YARN-1271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.2.0
>
> Attachments: YARN-1271-branch-2.patch, YARN-1271.patch
>
>
> The error is shown below in the comments.
> MAPREDUCE-2374 fixed this by removing "-c" when running the container launch 
> script.  It looks like the "-c" got brought back during the windows branch 
> merge, so we should remove it again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792157#comment-13792157
 ] 

Tsuyoshi OZAWA commented on YARN-1293:
--

I found that this problem is caused when the system locale is not English.

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors

2013-10-10 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-1295:


 Summary: In UnixLocalWrapperScriptBuilder, using bash -c can cause 
"Text file busy" errors
 Key: YARN-1295
 URL: https://issues.apache.org/jira/browse/YARN-1295
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza


I missed this when working on YARN-1271.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1294) Log4j settings in container-log4j.properties cannot be overridden

2013-10-10 Thread Eugene Koifman (JIRA)
Eugene Koifman created YARN-1294:


 Summary: Log4j settings in container-log4j.properties cannot be 
overridden 
 Key: YARN-1294
 URL: https://issues.apache.org/jira/browse/YARN-1294
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Eugene Koifman


setting HADOOP_ROOT_LOGGER, -Dhadoop.root.logger has no effect



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792132#comment-13792132
 ] 

Tsuyoshi OZAWA commented on YARN-1172:
--

Should we make org.apache.hadoop.security.token.SecretManager extend 
AbstractService for this JIRA? Or, we can only make YARN-related 
*SecretManagers extend AbstractService.

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1293:
-

Fix Version/s: 2.2.0

> TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
> --
>
> Key: YARN-1293
> URL: https://issues.apache.org/jira/browse/YARN-1293
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: linux
>Reporter: Tsuyoshi OZAWA
> Fix For: 2.2.0
>
>
> {quote}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
> testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:48)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created YARN-1293:


 Summary: TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails 
on trunk
 Key: YARN-1293
 URL: https://issues.apache.org/jira/browse/YARN-1293
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: linux
Reporter: Tsuyoshi OZAWA


{quote}
---
Test set: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
---
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
  Time elapsed: 0.114 sec  <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:48)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertTrue(Assert.java:27)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
{quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore

2013-10-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792121#comment-13792121
 ] 

Jian He commented on YARN-1058:
---

Believe we have fixed this, close it.

> Recovery issues on RM Restart with FileSystemRMStateStore
> -
>
> Key: YARN-1058
> URL: https://issues.apache.org/jira/browse/YARN-1058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> App recovery doesn't work as expected using FileSystemRMStateStore.
> Steps to reproduce:
> - Ran sleep job with a single map and sleep time of 2 mins
> - Restarted RM while the map task is still running
> - The first attempt fails with the following error
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Password not found for ApplicationAttempt 
> appattempt_1376294441253_0001_01
>   at org.apache.hadoop.ipc.Client.call(Client.java:1404)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1357)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at $Proxy28.finishApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91)
> {noformat}
> - The second attempt fails with a different error:
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist:
>  File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have 
> any open files.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore

2013-10-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-1058.
---

Resolution: Fixed

> Recovery issues on RM Restart with FileSystemRMStateStore
> -
>
> Key: YARN-1058
> URL: https://issues.apache.org/jira/browse/YARN-1058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> App recovery doesn't work as expected using FileSystemRMStateStore.
> Steps to reproduce:
> - Ran sleep job with a single map and sleep time of 2 mins
> - Restarted RM while the map task is still running
> - The first attempt fails with the following error
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Password not found for ApplicationAttempt 
> appattempt_1376294441253_0001_01
>   at org.apache.hadoop.ipc.Client.call(Client.java:1404)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1357)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at $Proxy28.finishApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91)
> {noformat}
> - The second attempt fails with a different error:
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist:
>  File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have 
> any open files.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792119#comment-13792119
 ] 

Hadoop QA commented on YARN-1182:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607911/yarn-1182-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2162//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2162//console

This message is automatically generated.

> MiniYARNCluster creates and inits the RM/NM only on start()
> ---
>
> Key: YARN-1182
> URL: https://issues.apache.org/jira/browse/YARN-1182
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-1182-1.patch, yarn-1182-2.patch
>
>
> MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
> and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application

2013-10-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792113#comment-13792113
 ] 

Junping Du commented on YARN-879:
-

Thanks Devaraj K for review!

> Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
> --
>
> Key: YARN-879
> URL: https://issues.apache.org/jira/browse/YARN-879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.2.1
>
> Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, 
> YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch
>
>
> getResources() will return a list of containers that allocated by RM. 
> However, it is now return null directly. The worse thing is: if LOG.debug is 
> enabled, then it will definitely cause NPE exception.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-10 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-1172:


Assignee: Tsuyoshi OZAWA

> Convert *SecretManagers in the RM to services
> -
>
> Key: YARN-1172
> URL: https://issues.apache.org/jira/browse/YARN-1172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792099#comment-13792099
 ] 

Sandy Ryza commented on YARN-1182:
--

+1

> MiniYARNCluster creates and inits the RM/NM only on start()
> ---
>
> Key: YARN-1182
> URL: https://issues.apache.org/jira/browse/YARN-1182
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-1182-1.patch, yarn-1182-2.patch
>
>
> MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
> and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1182:
---

Attachment: yarn-1182-2.patch

Thanks for the review, Sandy. Here is an updated patch that fixes that.

For sanity, I ran all tests under hadoop-mapreduce-project and the change 
doesn't introduce any test failures.

> MiniYARNCluster creates and inits the RM/NM only on start()
> ---
>
> Key: YARN-1182
> URL: https://issues.apache.org/jira/browse/YARN-1182
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-1182-1.patch, yarn-1182-2.patch
>
>
> MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
> and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792079#comment-13792079
 ] 

Hudson commented on YARN-1265:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4581 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4581/])
YARN-1265. Fair Scheduler chokes on unhealthy node reconnect (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531146)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> Fair Scheduler chokes on unhealthy node reconnect
> -
>
> Key: YARN-1265
> URL: https://issues.apache.org/jira/browse/YARN-1265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.1.1-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.2.1
>
> Attachments: YARN-1265-1.patch, YARN-1265.patch
>
>
> Only nodes in the RUNNING state are tracked by schedulers.  When a node 
> reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
> it's in the RUNNING state.  The FairScheduler doesn't guard against this.
> I think the best way to fix this is to check to see whether a node is RUNNING 
> before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1292) De-link container life cycle from the process it runs

2013-10-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792063#comment-13792063
 ] 

Bikas Saha commented on YARN-1292:
--

This can be achieved in a backwards compatible manner in the following way
1) StartContainer request will have a new flag that says whether the container 
is attached to a process or not. Default value is true for back-compat.
2) If the above flag is false then the container is completed on the NM only 
when
a) the RM terminates the container (this currently happens today)
b) when the AM call StopContainer on that (this is currently supported)
The main change in the NM would be to not trigger end of container, ie keep the 
container in a running state, when there is no process associated with the 
container.
3) Create a new api called startProcess() that can be used to launch a new 
process in a container. NM can dis-allow starting a process while a process is 
already running for the first cut. This API would be secured using existing 
AMNM token.

No changes are expected to be needed in the RM since the NM will continue to 
report this container as running to the RM. This should be a fairly localised 
NM-only change.


> De-link container life cycle from the process it runs
> -
>
> Key: YARN-1292
> URL: https://issues.apache.org/jira/browse/YARN-1292
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.1.1-beta
>Reporter: Bikas Saha
>
> Currently, a container is considered done when its OS process exits. This 
> makes it cumbersome for apps to be able to reuse containers for different 
> processes. Long running daemons may want to run in the same containers as the 
> previous versions. So eg. is an hbase region server crashes/upgraded it would 
> want to restart in the same container where everything it needs would already 
> be warm and ready.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792061#comment-13792061
 ] 

Hadoop QA commented on YARN-415:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607895/YARN-415--n6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2161//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2161//console

This message is automatically generated.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1292) De-link container life cycle from the process it runs

2013-10-10 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1292:


 Summary: De-link container life cycle from the process it runs
 Key: YARN-1292
 URL: https://issues.apache.org/jira/browse/YARN-1292
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha


Currently, a container is considered done when its OS process exits. This makes 
it cumbersome for apps to be able to reuse containers for different processes. 
Long running daemons may want to run in the same containers as the previous 
versions. So eg. is an hbase region server crashes/upgraded it would want to 
restart in the same container where everything it needs would already be warm 
and ready.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

2013-10-10 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792046#comment-13792046
 ] 

Alejandro Abdelnur commented on YARN-1265:
--

+1

> Fair Scheduler chokes on unhealthy node reconnect
> -
>
> Key: YARN-1265
> URL: https://issues.apache.org/jira/browse/YARN-1265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.1.1-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1265-1.patch, YARN-1265.patch
>
>
> Only nodes in the RUNNING state are tracked by schedulers.  When a node 
> reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
> it's in the RUNNING state.  The FairScheduler doesn't guard against this.
> I think the best way to fix this is to check to see whether a node is RUNNING 
> before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-10 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-415:
-

Attachment: YARN-415--n6.patch

With the 1st option it's not clear how to implement a protection from leaks. 
There's no event which can be used to check for leaks in that case. At the same 
time currently Yarn behavior does not support containers surviving after AM is 
finished, so the 2nd option is acceptable. This may need to be changed when 
there'll be support for long-lived apps and attempts which stay alive after AM 
is stopped.

Attaching a patch which implements option #2 and adds a test for it.


> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
> YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1291) RM INFO logs limit scheduling speed

2013-10-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791999#comment-13791999
 ] 

Sandy Ryza commented on YARN-1291:
--

I would like to demote the RMContainerImpl state transition log to DEBUG and 
use an AsyncAppender for the RMAuditLogger (at least make this configurable if 
not default).
[~vinodkv], as these logs are pretty core, wanted to check what your thoughts 
are on this?

> RM INFO logs limit scheduling speed
> ---
>
> Key: YARN-1291
> URL: https://issues.apache.org/jira/browse/YARN-1291
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> I've been running some microbenchmarks to see how fast the Fair Scheduler can 
> fill up a cluster and found its performance is significantly hampered by 
> logging.
> I tested with 500 (mock) nodes, and found that:
> * Taking out fair scheduler INFO logs on the critical path brought down the 
> latency from 14000 ms to 6000 ms
> * Taking out the INFO that RMContainerImpl logs when a container transitions 
> brought it down from 6000 ms to 4000 ms
> * Taking out RMAuditLogger logs brought it down from 4000 ms to 1700 ms



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1291) RM INFO logs limit scheduling speed

2013-10-10 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-1291:


 Summary: RM INFO logs limit scheduling speed
 Key: YARN-1291
 URL: https://issues.apache.org/jira/browse/YARN-1291
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza


I've been running some microbenchmarks to see how fast the Fair Scheduler can 
fill up a cluster and found its performance is significantly hampered by 
logging.

I tested with 500 (mock) nodes, and found that:
* Taking out fair scheduler INFO logs on the critical path brought down the 
latency from 14000 ms to 6000 ms
* Taking out the INFO that RMContainerImpl logs when a container transitions 
brought it down from 6000 ms to 4000 ms
* Taking out RMAuditLogger logs brought it down from 4000 ms to 1700 ms



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues

2013-10-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791956#comment-13791956
 ] 

Karthik Kambatla commented on YARN-1241:


Looks good to me. 

> In Fair Scheduler maxRunningApps does not work for non-leaf queues
> --
>
> Key: YARN-1241
> URL: https://issues.apache.org/jira/browse/YARN-1241
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, 
> YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241.patch
>
>
> Setting the maxRunningApps property on a parent queue should make it that the 
> sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791952#comment-13791952
 ] 

Sandy Ryza commented on YARN-1182:
--

Nit:
{code}
+conf.set(YarnConfiguration.RM_ADMIN_ADDRESS, hostname + ":0");
+conf.set(YarnConfiguration.RM_SCHEDULER_ADDRESS, hostname + ":0");
+conf.set(YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS, hostname + 
":0");
+WebAppUtils.setRMWebAppHostnameAndPort(getConfig(), hostname, 0);
{code}
getConfig() should be replaced with conf on the last line, right?

Otherwise LGTM

> MiniYARNCluster creates and inits the RM/NM only on start()
> ---
>
> Key: YARN-1182
> URL: https://issues.apache.org/jira/browse/YARN-1182
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-1182-1.patch
>
>
> MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
> and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-10 Thread Wei Yan (JIRA)
Wei Yan created YARN-1290:
-

 Summary: Let continuous scheduling achieve more balanced task 
assignment
 Key: YARN-1290
 URL: https://issues.apache.org/jira/browse/YARN-1290
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan


Currently, in continuous scheduling (YARN-1010), in each round, the thread 
iterates over pre-ordered nodes and assigns tasks. This mechanism may overload 
the first several nodes, while the latter nodes have no tasks.

We should sort all nodes according to available resource. In each round, always 
assign tasks to nodes with larger capacity, which can balance the load 
distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791695#comment-13791695
 ] 

Hadoop QA commented on YARN-1182:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607835/yarn-1182-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2160//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2160//console

This message is automatically generated.

> MiniYARNCluster creates and inits the RM/NM only on start()
> ---
>
> Key: YARN-1182
> URL: https://issues.apache.org/jira/browse/YARN-1182
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-1182-1.patch
>
>
> MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
> and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1182:
---

Attachment: yarn-1182-1.patch

Straight-forward patch that moves creation and init to serviceInit(). Ran a 
couple of tests that use MiniYARNCluster. Submitting patch to see if Jenkins 
finds any other issues.

> MiniYARNCluster creates and inits the RM/NM only on start()
> ---
>
> Key: YARN-1182
> URL: https://issues.apache.org/jira/browse/YARN-1182
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-1182-1.patch
>
>
> MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
> and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791525#comment-13791525
 ] 

Hudson commented on YARN-1284:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1574 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1574/])
Amending yarn CHANGES.txt moving YARN-1284 to 2.2.1 (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530716)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> LCE: Race condition leaves dangling cgroups entries for killed containers
> -
>
> Key: YARN-1284
> URL: https://issues.apache.org/jira/browse/YARN-1284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.2.1
>
> Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, 
> YARN-1284.patch, YARN-1284.patch
>
>
> When LCE & cgroups are enabled, when a container is is killed (in this case 
> by its owning AM, an MRAM) it seems to be a race condition at OS level when 
> doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. 
> LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, 
> immediately attempts to clean up the cgroups entry for the container. But 
> this is failing with an error like:
> {code}
> 2013-10-07 15:21:24,359 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1381179532433_0016_01_11 is : 143
> 2013-10-07 15:21:24,359 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Processing container_1381179532433_0016_01_11 of type 
> UPDATE_DIAGNOSTICS_MSG
> 2013-10-07 15:21:24,359 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> deleteCgroup: 
> /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11
> 2013-10-07 15:21:24,359 WARN 
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> Unable to delete cgroup at: 
> /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11
> {code}
> CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM 
> containers to avoid this problem. it seems this should be done for all 
> containers.
> Still, waiting for extra 500ms seems too expensive.
> We should look at a way of doing this in a more 'efficient way' from time 
> perspective, may be spinning while the deleteCgroup() cannot be done with a 
> minimal sleep and a timeout.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791526#comment-13791526
 ] 

Hudson commented on YARN-1283:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1574 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1574/])
YARN-1283. Fixed RM to give a fully-qualified proxy URL for an application so 
that clients don't need to do scheme-mangling. Contributed by Omkar Vinit 
Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530819)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
> -
>
> Key: YARN-1283
> URL: https://issues.apache.org/jira/browse/YARN-1283
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Omkar Vinit Joshi
>  Labels: newbie
> Fix For: 2.2.1
>
> Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, 
> YARN-1283.20131008.2.patch, YARN-1283.3.patch
>
>
> After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect 
> "The url to track the job".
> Currently, its printing 
> http://RM:/proxy/application_1381162886563_0001/ instead 
> https://RM:/proxy/application_1381162886563_0001/
> http://hostname:8088/proxy/application_1381162886563_0001/ is invalid
> hadoop  jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 
> 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname/100.00.00.000:8032
> 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1
> 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. 
> Instead, use mapreduce.job.user.name
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. 
> Instead, use mapreduce.job.jar
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.map.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.map.speculative
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is 
> deprecated. Instead, use mapreduce.job.reduces
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class 
> is deprecated. Instead, use mapreduce.job.partitioner.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.reduce.speculative
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.mapoutput.value.class is deprecated. Instead, use 
> mapreduce.map.output.value.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is 
> deprecated. Instead, use mapreduce.job.map.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is 
> deprecated. Instead, use mapreduce.job.name
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is 
> deprecated. Instead, use mapreduce.job.reduce.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class 
> is deprecated. Instead, use mapreduce.job.inputformat.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is 
> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapreduce.outputformat.class is deprecated. Instead, use 
> mapreduce.job.outputformat.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is 
> deprecated. Instead, use mapreduce.job.maps
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class 
> is deprecated. Instead, use mapreduce.map.output.key.class
> 13/10/

[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791527#comment-13791527
 ] 

Hudson commented on YARN-879:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1574 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1574/])
YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. 
Contributed by Junping Du. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


> Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
> --
>
> Key: YARN-879
> URL: https://issues.apache.org/jira/browse/YARN-879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.2.1
>
> Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, 
> YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch
>
>
> getResources() will return a list of containers that allocated by RM. 
> However, it is now return null directly. The worse thing is: if LOG.debug is 
> enabled, then it will definitely cause NPE exception.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791487#comment-13791487
 ] 

Hudson commented on YARN-879:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1548 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1548/])
YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. 
Contributed by Junping Du. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


> Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
> --
>
> Key: YARN-879
> URL: https://issues.apache.org/jira/browse/YARN-879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.2.1
>
> Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, 
> YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch
>
>
> getResources() will return a list of containers that allocated by RM. 
> However, it is now return null directly. The worse thing is: if LOG.debug is 
> enabled, then it will definitely cause NPE exception.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791486#comment-13791486
 ] 

Hudson commented on YARN-1283:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1548 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1548/])
YARN-1283. Fixed RM to give a fully-qualified proxy URL for an application so 
that clients don't need to do scheme-mangling. Contributed by Omkar Vinit 
Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530819)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
> -
>
> Key: YARN-1283
> URL: https://issues.apache.org/jira/browse/YARN-1283
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Omkar Vinit Joshi
>  Labels: newbie
> Fix For: 2.2.1
>
> Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, 
> YARN-1283.20131008.2.patch, YARN-1283.3.patch
>
>
> After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect 
> "The url to track the job".
> Currently, its printing 
> http://RM:/proxy/application_1381162886563_0001/ instead 
> https://RM:/proxy/application_1381162886563_0001/
> http://hostname:8088/proxy/application_1381162886563_0001/ is invalid
> hadoop  jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 
> 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname/100.00.00.000:8032
> 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1
> 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. 
> Instead, use mapreduce.job.user.name
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. 
> Instead, use mapreduce.job.jar
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.map.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.map.speculative
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is 
> deprecated. Instead, use mapreduce.job.reduces
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class 
> is deprecated. Instead, use mapreduce.job.partitioner.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.reduce.speculative
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.mapoutput.value.class is deprecated. Instead, use 
> mapreduce.map.output.value.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is 
> deprecated. Instead, use mapreduce.job.map.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is 
> deprecated. Instead, use mapreduce.job.name
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is 
> deprecated. Instead, use mapreduce.job.reduce.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class 
> is deprecated. Instead, use mapreduce.job.inputformat.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is 
> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapreduce.outputformat.class is deprecated. Instead, use 
> mapreduce.job.outputformat.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is 
> deprecated. Instead, use mapreduce.job.maps
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class 
> is deprecated. Instead, use mapreduce.map.output.key.class
> 13/10/07 18:39:4

[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791485#comment-13791485
 ] 

Hudson commented on YARN-1284:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1548 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1548/])
Amending yarn CHANGES.txt moving YARN-1284 to 2.2.1 (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530716)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> LCE: Race condition leaves dangling cgroups entries for killed containers
> -
>
> Key: YARN-1284
> URL: https://issues.apache.org/jira/browse/YARN-1284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.2.1
>
> Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, 
> YARN-1284.patch, YARN-1284.patch
>
>
> When LCE & cgroups are enabled, when a container is is killed (in this case 
> by its owning AM, an MRAM) it seems to be a race condition at OS level when 
> doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. 
> LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, 
> immediately attempts to clean up the cgroups entry for the container. But 
> this is failing with an error like:
> {code}
> 2013-10-07 15:21:24,359 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1381179532433_0016_01_11 is : 143
> 2013-10-07 15:21:24,359 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Processing container_1381179532433_0016_01_11 of type 
> UPDATE_DIAGNOSTICS_MSG
> 2013-10-07 15:21:24,359 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> deleteCgroup: 
> /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11
> 2013-10-07 15:21:24,359 WARN 
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> Unable to delete cgroup at: 
> /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11
> {code}
> CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM 
> containers to avoid this problem. it seems this should be done for all 
> containers.
> Still, waiting for extra 500ms seems too expensive.
> We should look at a way of doing this in a more 'efficient way' from time 
> perspective, may be spinning while the deleteCgroup() cannot be done with a 
> minimal sleep and a timeout.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791385#comment-13791385
 ] 

Hudson commented on YARN-1283:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #358 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/358/])
YARN-1283. Fixed RM to give a fully-qualified proxy URL for an application so 
that clients don't need to do scheme-mangling. Contributed by Omkar Vinit 
Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530819)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
> -
>
> Key: YARN-1283
> URL: https://issues.apache.org/jira/browse/YARN-1283
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Omkar Vinit Joshi
>  Labels: newbie
> Fix For: 2.2.1
>
> Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, 
> YARN-1283.20131008.2.patch, YARN-1283.3.patch
>
>
> After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect 
> "The url to track the job".
> Currently, its printing 
> http://RM:/proxy/application_1381162886563_0001/ instead 
> https://RM:/proxy/application_1381162886563_0001/
> http://hostname:8088/proxy/application_1381162886563_0001/ is invalid
> hadoop  jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 
> 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname/100.00.00.000:8032
> 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1
> 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. 
> Instead, use mapreduce.job.user.name
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. 
> Instead, use mapreduce.job.jar
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.map.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.map.speculative
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is 
> deprecated. Instead, use mapreduce.job.reduces
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class 
> is deprecated. Instead, use mapreduce.job.partitioner.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.reduce.speculative
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapred.mapoutput.value.class is deprecated. Instead, use 
> mapreduce.map.output.value.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is 
> deprecated. Instead, use mapreduce.job.map.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is 
> deprecated. Instead, use mapreduce.job.name
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is 
> deprecated. Instead, use mapreduce.job.reduce.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class 
> is deprecated. Instead, use mapreduce.job.inputformat.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is 
> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 13/10/07 18:39:40 INFO Configuration.deprecation: 
> mapreduce.outputformat.class is deprecated. Instead, use 
> mapreduce.job.outputformat.class
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is 
> deprecated. Instead, use mapreduce.job.maps
> 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class 
> is deprecated. Instead, use mapreduce.map.output.key.class
> 13/10/07 18:39:40 

[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791386#comment-13791386
 ] 

Hudson commented on YARN-879:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #358 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/358/])
YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. 
Contributed by Junping Du. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


> Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
> --
>
> Key: YARN-879
> URL: https://issues.apache.org/jira/browse/YARN-879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.2.1
>
> Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, 
> YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch
>
>
> getResources() will return a list of containers that allocated by RM. 
> However, it is now return null directly. The worse thing is: if LOG.debug is 
> enabled, then it will definitely cause NPE exception.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791384#comment-13791384
 ] 

Hudson commented on YARN-1284:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #358 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/358/])
Amending yarn CHANGES.txt moving YARN-1284 to 2.2.1 (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530716)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> LCE: Race condition leaves dangling cgroups entries for killed containers
> -
>
> Key: YARN-1284
> URL: https://issues.apache.org/jira/browse/YARN-1284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.2.1
>
> Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, 
> YARN-1284.patch, YARN-1284.patch
>
>
> When LCE & cgroups are enabled, when a container is is killed (in this case 
> by its owning AM, an MRAM) it seems to be a race condition at OS level when 
> doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. 
> LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, 
> immediately attempts to clean up the cgroups entry for the container. But 
> this is failing with an error like:
> {code}
> 2013-10-07 15:21:24,359 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1381179532433_0016_01_11 is : 143
> 2013-10-07 15:21:24,359 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Processing container_1381179532433_0016_01_11 of type 
> UPDATE_DIAGNOSTICS_MSG
> 2013-10-07 15:21:24,359 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> deleteCgroup: 
> /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11
> 2013-10-07 15:21:24,359 WARN 
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> Unable to delete cgroup at: 
> /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11
> {code}
> CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM 
> containers to avoid this problem. it seems this should be done for all 
> containers.
> Still, waiting for extra 500ms seems too expensive.
> We should look at a way of doing this in a more 'efficient way' from time 
> perspective, may be spinning while the deleteCgroup() cannot be done with a 
> minimal sleep and a timeout.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application

2013-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791356#comment-13791356
 ] 

Hudson commented on YARN-879:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4579 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4579/])
YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. 
Contributed by Junping Du. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


> Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
> --
>
> Key: YARN-879
> URL: https://issues.apache.org/jira/browse/YARN-879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.2.1
>
> Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, 
> YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch
>
>
> getResources() will return a list of containers that allocated by RM. 
> However, it is now return null directly. The worse thing is: if LOG.debug is 
> enabled, then it will definitely cause NPE exception.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application

2013-10-10 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791348#comment-13791348
 ] 

Devaraj K commented on YARN-879:


+1, Latest patch looks good to me, will commit this shortly.

> Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
> --
>
> Key: YARN-879
> URL: https://issues.apache.org/jira/browse/YARN-879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, 
> YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch
>
>
> getResources() will return a list of containers that allocated by RM. 
> However, it is now return null directly. The worse thing is: if LOG.debug is 
> enabled, then it will definitely cause NPE exception.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration "yarn.nodemanager.aux-services" should have default value for mapreduce_shuffle.

2013-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791335#comment-13791335
 ] 

Hadoop QA commented on YARN-1289:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607730/YARN-1289.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync
  org.apache.hadoop.yarn.server.nodemanager.TestEventFlow
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2159//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2159//console

This message is automatically generated.

> Configuration "yarn.nodemanager.aux-services" should have default value for 
> mapreduce_shuffle.
> --
>
> Key: YARN-1289
> URL: https://issues.apache.org/jira/browse/YARN-1289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: wenwupeng
>Assignee: Junping Du
> Attachments: YARN-1289.patch
>
>
> Failed to run benchmark when not configure yarn.nodemanager.aux-services 
> value in yarn-site.xml', it is better to configure default value.
> 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
> attempt_1381371516570_0001_m_00_1, Status : FAILED
> Container launch failed for container_1381371516570_0001_01_05 : 
> org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
> auxService:mapreduce_shuffle does not exist
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory

2013-10-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791313#comment-13791313
 ] 

Junping Du commented on YARN-7:
---

Thanks Luke for review and comments!

> Add support for DistributedShell to ask for CPUs along with memory
> --
>
> Key: YARN-7
> URL: https://issues.apache.org/jira/browse/YARN-7
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.1-beta
>Reporter: Arun C Murthy
>Assignee: Junping Du
>  Labels: patch
> Attachments: YARN-7.patch, YARN-7-v2.patch, YARN-7-v3.patch, 
> YARN-7-v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)