[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792357#comment-13792357 ] Chris Nauroth commented on YARN-445: I haven't had a chance to look at this patch, but I did want to link to MAPREDUCE-5387. We have discussed the possibility of using {{SetConsoleCtrlHandler}}/{{GenerateConsoleCtrlEvent}} to approximate SIGTERM on Windows. (The current task termination logic on Windows is more like a SIGKILL.) Perhaps this patch could be a foundation for that. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration "yarn.nodemanager.aux-services" should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792310#comment-13792310 ] Junping Du commented on YARN-1289: -- I think the unit test failure is because other services are unnecessary loading ShuffleHandler after this change. May be the right way is to change serviceInit in NodeManager to set default property there? > Configuration "yarn.nodemanager.aux-services" should have default value for > mapreduce_shuffle. > -- > > Key: YARN-1289 > URL: https://issues.apache.org/jira/browse/YARN-1289 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: wenwupeng >Assignee: Junping Du > Attachments: YARN-1289.patch > > > Failed to run benchmark when not configure yarn.nodemanager.aux-services > value in yarn-site.xml', it is better to configure default value. > 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : > attempt_1381371516570_0001_m_00_1, Status : FAILED > Container launch failed for container_1381371516570_0001_01_05 : > org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The > auxService:mapreduce_shuffle does not exist > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper
[ https://issues.apache.org/jira/browse/YARN-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792303#comment-13792303 ] Ted Yu commented on YARN-1296: -- I found these two fair-scheduler-allocation.xml : ./hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml ./hadoop-tools/hadoop-sls/src/test/resources/fair-scheduler-allocation.xml But they seem to have '' as top-level element. > schedulerAllocateTimer is accessed without holding samplerLock in > ResourceSchedulerWrapper > -- > > Key: YARN-1296 > URL: https://issues.apache.org/jira/browse/YARN-1296 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Attachments: yarn-1296-v1.patch > > > Here is related code: > {code} > public Allocation allocate(ApplicationAttemptId attemptId, > List resourceRequests, > List containerIds, > List strings, List strings2) { > if (metricsON) { > final Timer.Context context = schedulerAllocateTimer.time(); > {code} > samplerLock should be used to guard the access. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration "yarn.nodemanager.aux-services" should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792298#comment-13792298 ] Junping Du commented on YARN-1289: -- The patch do fix the problem that I can deploy a cluster and run job successfully without specifying "yarn.nodemanager.aux-services" value now. Will take a look at unit test failures here. > Configuration "yarn.nodemanager.aux-services" should have default value for > mapreduce_shuffle. > -- > > Key: YARN-1289 > URL: https://issues.apache.org/jira/browse/YARN-1289 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: wenwupeng >Assignee: Junping Du > Attachments: YARN-1289.patch > > > Failed to run benchmark when not configure yarn.nodemanager.aux-services > value in yarn-site.xml', it is better to configure default value. > 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : > attempt_1381371516570_0001_m_00_1, Status : FAILED > Container launch failed for container_1381371516570_0001_01_05 : > org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The > auxService:mapreduce_shuffle does not exist > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper
[ https://issues.apache.org/jira/browse/YARN-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792297#comment-13792297 ] Hadoop QA commented on YARN-1296: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607939/yarn-1296-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls: org.apache.hadoop.yarn.sls.TestSLSRunner {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2165//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2165//console This message is automatically generated. > schedulerAllocateTimer is accessed without holding samplerLock in > ResourceSchedulerWrapper > -- > > Key: YARN-1296 > URL: https://issues.apache.org/jira/browse/YARN-1296 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Attachments: yarn-1296-v1.patch > > > Here is related code: > {code} > public Allocation allocate(ApplicationAttemptId attemptId, > List resourceRequests, > List containerIds, > List strings, List strings2) { > if (metricsON) { > final Timer.Context context = schedulerAllocateTimer.time(); > {code} > samplerLock should be used to guard the access. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper
[ https://issues.apache.org/jira/browse/YARN-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated YARN-1296: - Attachment: yarn-1296-v1.patch > schedulerAllocateTimer is accessed without holding samplerLock in > ResourceSchedulerWrapper > -- > > Key: YARN-1296 > URL: https://issues.apache.org/jira/browse/YARN-1296 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Attachments: yarn-1296-v1.patch > > > Here is related code: > {code} > public Allocation allocate(ApplicationAttemptId attemptId, > List resourceRequests, > List containerIds, > List strings, List strings2) { > if (metricsON) { > final Timer.Context context = schedulerAllocateTimer.time(); > {code} > samplerLock should be used to guard the access. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper
Ted Yu created YARN-1296: Summary: schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper Key: YARN-1296 URL: https://issues.apache.org/jira/browse/YARN-1296 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Here is related code: {code} public Allocation allocate(ApplicationAttemptId attemptId, List resourceRequests, List containerIds, List strings, List strings2) { if (metricsON) { final Timer.Context context = schedulerAllocateTimer.time(); {code} samplerLock should be used to guard the access. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792238#comment-13792238 ] Bikas Saha commented on YARN-1068: -- This should probably creating a new conf and override it instead of changing things in the original conf. {code} + YarnConfiguration conf = (YarnConfiguration) getConf(); + conf.set(YarnConfiguration.RM_HA_ID, rmId); + return new RMHAServiceTarget(conf); {code} Shoudlnt it be "transitionToActive"? {code} + RMAuditLogger.logFailure(user.getShortUserName(), "transitionToStandby", + adminAcl.toString(), "RMHAProtocolService", + "Exception transitioning to active"); {code} We shouldnt be wrapping some unknown exception into an AccessControlException {code} + private UserGroupInformation checkAccess(String method) throws AccessControlException { +try { + return RMServerUtils.verifyAccess(adminAcl, method, LOG); +} catch (YarnException e) { + throw new AccessControlException(e); +} {code} This method isnt even throwing an accesscontrolexception. Then why are transitionToStandby() etc changing signature to throw AccessControlException. {code} + public static UserGroupInformation verifyAccess( + AccessControlList acl, String method, final Log LOG) + throws YarnException { {code} New name doesnt seem to follow convention based on other names in that file. YARN_SECURITY_SERVICE_AUTHORIZATION_FOO {code} new Service( YarnConfiguration.YARN_SECURITY_SERVICE_AUTHORIZATION_CONTAINER_MANAGEMENT_PROTOCOL, ContainerManagementProtocolPB.class), +new Service( +CommonConfigurationKeys.SECURITY_HA_SERVICE_PROTOCOL_ACL, +HAServiceProtocol.class), {code} > Add admin support for HA operations > --- > > Key: YARN-1068 > URL: https://issues.apache.org/jira/browse/YARN-1068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1068-10.patch, yarn-1068-1.patch, > yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, > yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, > yarn-1068-prelim.patch > > > Support HA admin operations to facilitate transitioning the RM to Active and > Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792227#comment-13792227 ] Tsuyoshi OZAWA commented on YARN-1293: -- Thanks for your review! > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > Attachments: YARN-1293.1.patch > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1293: - Hadoop Flags: Reviewed > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > Attachments: YARN-1293.1.patch > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792221#comment-13792221 ] Jian He commented on YARN-1293: --- patch looks good, thanks for the fix! > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > Attachments: YARN-1293.1.patch > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792209#comment-13792209 ] Hadoop QA commented on YARN-1293: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607927/YARN-1293.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2164//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2164//console This message is automatically generated. > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > Attachments: YARN-1293.1.patch > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792203#comment-13792203 ] Tsuyoshi OZAWA commented on YARN-1172: -- Thank you for your comment, Karthik. I'm trying to implement this change only of YARN-related *SecretManagers for now, because there are some HDFS-related *SecretManagers which extends org.apache.hadoop.security.token.SecretManager. > Convert *SecretManagers in the RM to services > - > > Key: YARN-1172 > URL: https://issues.apache.org/jira/browse/YARN-1172 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors
[ https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792191#comment-13792191 ] Hadoop QA commented on YARN-1295: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607925/YARN-1295.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2163//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2163//console This message is automatically generated. > In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" > errors > - > > Key: YARN-1295 > URL: https://issues.apache.org/jira/browse/YARN-1295 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1295.patch > > > I missed this when working on YARN-1271. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore
[ https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792190#comment-13792190 ] Jian He commented on YARN-1058: --- YARN-1116 fixed the AMRMToken part , MAPREDUCE-5476 fixed the staging dir part > Recovery issues on RM Restart with FileSystemRMStateStore > - > > Key: YARN-1058 > URL: https://issues.apache.org/jira/browse/YARN-1058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > App recovery doesn't work as expected using FileSystemRMStateStore. > Steps to reproduce: > - Ran sleep job with a single map and sleep time of 2 mins > - Restarted RM while the map task is still running > - The first attempt fails with the following error > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Password not found for ApplicationAttempt > appattempt_1376294441253_0001_01 > at org.apache.hadoop.ipc.Client.call(Client.java:1404) > at org.apache.hadoop.ipc.Client.call(Client.java:1357) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy28.finishApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91) > {noformat} > - The second attempt fails with a different error: > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist: > File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792187#comment-13792187 ] Tsuyoshi OZAWA commented on YARN-1293: -- Closing this problems itself is no problem. The essential problem is there are no document in hadoop project about locale. IMHO, we should document it instead of fixing this problem. The document as follows are candidates to fix. 1. http://wiki.apache.org/hadoop/HowToContribute 2. BUILDING.txt. What do you think? > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > Attachments: YARN-1293.1.patch > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792183#comment-13792183 ] Karthik Kambatla commented on YARN-1172: When filing the JIRA, I was thinking only of YARN-related *SecretManagers. I haven't looked into the mechanics of doing that, it might require org.apache.hadoop.security.token.SecretManager to be an AbstractService. If that is the case, it might be better to open a separate Common JIRA for that change alone. > Convert *SecretManagers in the RM to services > - > > Key: YARN-1172 > URL: https://issues.apache.org/jira/browse/YARN-1172 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792177#comment-13792177 ] Jian He commented on YARN-1293: --- bq. I found that this problem is caused when the system locale is not English. Ahh, can you please close it ? thanks > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > Attachments: YARN-1293.1.patch > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore
[ https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792179#comment-13792179 ] Karthik Kambatla commented on YARN-1058: I have also noticed that this was fixed in my testing of RM HA, but I haven't figured out what change has fixed this. [~jianhe], any idea which JIRA might have fixed this? > Recovery issues on RM Restart with FileSystemRMStateStore > - > > Key: YARN-1058 > URL: https://issues.apache.org/jira/browse/YARN-1058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > App recovery doesn't work as expected using FileSystemRMStateStore. > Steps to reproduce: > - Ran sleep job with a single map and sleep time of 2 mins > - Restarted RM while the map task is still running > - The first attempt fails with the following error > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Password not found for ApplicationAttempt > appattempt_1376294441253_0001_01 > at org.apache.hadoop.ipc.Client.call(Client.java:1404) > at org.apache.hadoop.ipc.Client.call(Client.java:1357) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy28.finishApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91) > {noformat} > - The second attempt fails with a different error: > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist: > File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1293: - Attachment: YARN-1293.1.patch Fix to set LANG as C. > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > Attachments: YARN-1293.1.patch > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-1293: Assignee: Tsuyoshi OZAWA > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792172#comment-13792172 ] Tsuyoshi OZAWA commented on YARN-1293: -- Hi Jian, LANG in my environment is ja_JP.UTF-8. > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.2.0 > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792169#comment-13792169 ] Jian He commented on YARN-1293: --- Hi, [~ozawa], did not reproduce this locally, what is the environment you are running ? > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA > Fix For: 2.2.0 > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors
[ https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1295: - Attachment: YARN-1295.patch > In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" > errors > - > > Key: YARN-1295 > URL: https://issues.apache.org/jira/browse/YARN-1295 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1295.patch > > > I missed this when working on YARN-1271. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors
[ https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792161#comment-13792161 ] Sandy Ryza commented on YARN-1295: -- Grepped through the code for "-c" and didn't find anywhere else that needs this change. > In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" > errors > - > > Key: YARN-1295 > URL: https://issues.apache.org/jira/browse/YARN-1295 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > I missed this when working on YARN-1271. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) "Text file busy" errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792158#comment-13792158 ] Sandy Ryza commented on YARN-1271: -- These errors are still coming up for me after the patch. I took another look and apparently I had looked at UnixShellScriptBuilder, but missed UnixLocalWrapperScriptBuilder, which also uses the "-c". Filed YARN-1295 for this. Sorry for all the noise. > "Text file busy" errors launching containers again > -- > > Key: YARN-1271 > URL: https://issues.apache.org/jira/browse/YARN-1271 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.2.0 > > Attachments: YARN-1271-branch-2.patch, YARN-1271.patch > > > The error is shown below in the comments. > MAPREDUCE-2374 fixed this by removing "-c" when running the container launch > script. It looks like the "-c" got brought back during the windows branch > merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792157#comment-13792157 ] Tsuyoshi OZAWA commented on YARN-1293: -- I found that this problem is caused when the system locale is not English. > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA > Fix For: 2.2.0 > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors
Sandy Ryza created YARN-1295: Summary: In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors Key: YARN-1295 URL: https://issues.apache.org/jira/browse/YARN-1295 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza I missed this when working on YARN-1271. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1294) Log4j settings in container-log4j.properties cannot be overridden
Eugene Koifman created YARN-1294: Summary: Log4j settings in container-log4j.properties cannot be overridden Key: YARN-1294 URL: https://issues.apache.org/jira/browse/YARN-1294 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Eugene Koifman setting HADOOP_ROOT_LOGGER, -Dhadoop.root.logger has no effect -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792132#comment-13792132 ] Tsuyoshi OZAWA commented on YARN-1172: -- Should we make org.apache.hadoop.security.token.SecretManager extend AbstractService for this JIRA? Or, we can only make YARN-related *SecretManagers extend AbstractService. > Convert *SecretManagers in the RM to services > - > > Key: YARN-1172 > URL: https://issues.apache.org/jira/browse/YARN-1172 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1293: - Fix Version/s: 2.2.0 > TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk > -- > > Key: YARN-1293 > URL: https://issues.apache.org/jira/browse/YARN-1293 > Project: Hadoop YARN > Issue Type: Bug > Environment: linux >Reporter: Tsuyoshi OZAWA > Fix For: 2.2.0 > > > {quote} > --- > Test set: > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch > testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) > Time elapsed: 0.114 sec <<< FAILURE! > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:48) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
Tsuyoshi OZAWA created YARN-1293: Summary: TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk Key: YARN-1293 URL: https://issues.apache.org/jira/browse/YARN-1293 Project: Hadoop YARN Issue Type: Bug Environment: linux Reporter: Tsuyoshi OZAWA {quote} --- Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 0.114 sec <<< FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore
[ https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792121#comment-13792121 ] Jian He commented on YARN-1058: --- Believe we have fixed this, close it. > Recovery issues on RM Restart with FileSystemRMStateStore > - > > Key: YARN-1058 > URL: https://issues.apache.org/jira/browse/YARN-1058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > App recovery doesn't work as expected using FileSystemRMStateStore. > Steps to reproduce: > - Ran sleep job with a single map and sleep time of 2 mins > - Restarted RM while the map task is still running > - The first attempt fails with the following error > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Password not found for ApplicationAttempt > appattempt_1376294441253_0001_01 > at org.apache.hadoop.ipc.Client.call(Client.java:1404) > at org.apache.hadoop.ipc.Client.call(Client.java:1357) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy28.finishApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91) > {noformat} > - The second attempt fails with a different error: > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist: > File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore
[ https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-1058. --- Resolution: Fixed > Recovery issues on RM Restart with FileSystemRMStateStore > - > > Key: YARN-1058 > URL: https://issues.apache.org/jira/browse/YARN-1058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > App recovery doesn't work as expected using FileSystemRMStateStore. > Steps to reproduce: > - Ran sleep job with a single map and sleep time of 2 mins > - Restarted RM while the map task is still running > - The first attempt fails with the following error > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Password not found for ApplicationAttempt > appattempt_1376294441253_0001_01 > at org.apache.hadoop.ipc.Client.call(Client.java:1404) > at org.apache.hadoop.ipc.Client.call(Client.java:1357) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy28.finishApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91) > {noformat} > - The second attempt fails with a different error: > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist: > File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792119#comment-13792119 ] Hadoop QA commented on YARN-1182: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607911/yarn-1182-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2162//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2162//console This message is automatically generated. > MiniYARNCluster creates and inits the RM/NM only on start() > --- > > Key: YARN-1182 > URL: https://issues.apache.org/jira/browse/YARN-1182 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1182-1.patch, yarn-1182-2.patch > > > MiniYARNCluster creates and inits the RM/NM only on start(). It should create > and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792113#comment-13792113 ] Junping Du commented on YARN-879: - Thanks Devaraj K for review! > Fix tests w.r.t o.a.h.y.server.resourcemanager.Application > -- > > Key: YARN-879 > URL: https://issues.apache.org/jira/browse/YARN-879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.2.1 > > Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, > YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch > > > getResources() will return a list of containers that allocated by RM. > However, it is now return null directly. The worse thing is: if LOG.debug is > enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-1172: Assignee: Tsuyoshi OZAWA > Convert *SecretManagers in the RM to services > - > > Key: YARN-1172 > URL: https://issues.apache.org/jira/browse/YARN-1172 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792099#comment-13792099 ] Sandy Ryza commented on YARN-1182: -- +1 > MiniYARNCluster creates and inits the RM/NM only on start() > --- > > Key: YARN-1182 > URL: https://issues.apache.org/jira/browse/YARN-1182 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1182-1.patch, yarn-1182-2.patch > > > MiniYARNCluster creates and inits the RM/NM only on start(). It should create > and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1182: --- Attachment: yarn-1182-2.patch Thanks for the review, Sandy. Here is an updated patch that fixes that. For sanity, I ran all tests under hadoop-mapreduce-project and the change doesn't introduce any test failures. > MiniYARNCluster creates and inits the RM/NM only on start() > --- > > Key: YARN-1182 > URL: https://issues.apache.org/jira/browse/YARN-1182 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1182-1.patch, yarn-1182-2.patch > > > MiniYARNCluster creates and inits the RM/NM only on start(). It should create > and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792079#comment-13792079 ] Hudson commented on YARN-1265: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4581 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4581/]) YARN-1265. Fair Scheduler chokes on unhealthy node reconnect (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531146) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java > Fair Scheduler chokes on unhealthy node reconnect > - > > Key: YARN-1265 > URL: https://issues.apache.org/jira/browse/YARN-1265 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.1.1-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.2.1 > > Attachments: YARN-1265-1.patch, YARN-1265.patch > > > Only nodes in the RUNNING state are tracked by schedulers. When a node > reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if > it's in the RUNNING state. The FairScheduler doesn't guard against this. > I think the best way to fix this is to check to see whether a node is RUNNING > before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1292) De-link container life cycle from the process it runs
[ https://issues.apache.org/jira/browse/YARN-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792063#comment-13792063 ] Bikas Saha commented on YARN-1292: -- This can be achieved in a backwards compatible manner in the following way 1) StartContainer request will have a new flag that says whether the container is attached to a process or not. Default value is true for back-compat. 2) If the above flag is false then the container is completed on the NM only when a) the RM terminates the container (this currently happens today) b) when the AM call StopContainer on that (this is currently supported) The main change in the NM would be to not trigger end of container, ie keep the container in a running state, when there is no process associated with the container. 3) Create a new api called startProcess() that can be used to launch a new process in a container. NM can dis-allow starting a process while a process is already running for the first cut. This API would be secured using existing AMNM token. No changes are expected to be needed in the RM since the NM will continue to report this container as running to the RM. This should be a fairly localised NM-only change. > De-link container life cycle from the process it runs > - > > Key: YARN-1292 > URL: https://issues.apache.org/jira/browse/YARN-1292 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.1.1-beta >Reporter: Bikas Saha > > Currently, a container is considered done when its OS process exits. This > makes it cumbersome for apps to be able to reuse containers for different > processes. Long running daemons may want to run in the same containers as the > previous versions. So eg. is an hbase region server crashes/upgraded it would > want to restart in the same container where everything it needs would already > be warm and ready. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792061#comment-13792061 ] Hadoop QA commented on YARN-415: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607895/YARN-415--n6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2161//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2161//console This message is automatically generated. > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n2.patch, YARN-415--n3.patch, > YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1292) De-link container life cycle from the process it runs
Bikas Saha created YARN-1292: Summary: De-link container life cycle from the process it runs Key: YARN-1292 URL: https://issues.apache.org/jira/browse/YARN-1292 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.1-beta Reporter: Bikas Saha Currently, a container is considered done when its OS process exits. This makes it cumbersome for apps to be able to reuse containers for different processes. Long running daemons may want to run in the same containers as the previous versions. So eg. is an hbase region server crashes/upgraded it would want to restart in the same container where everything it needs would already be warm and ready. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792046#comment-13792046 ] Alejandro Abdelnur commented on YARN-1265: -- +1 > Fair Scheduler chokes on unhealthy node reconnect > - > > Key: YARN-1265 > URL: https://issues.apache.org/jira/browse/YARN-1265 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.1.1-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1265-1.patch, YARN-1265.patch > > > Only nodes in the RUNNING state are tracked by schedulers. When a node > reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if > it's in the RUNNING state. The FairScheduler doesn't guard against this. > I think the best way to fix this is to check to see whether a node is RUNNING > before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-415: - Attachment: YARN-415--n6.patch With the 1st option it's not clear how to implement a protection from leaks. There's no event which can be used to check for leaks in that case. At the same time currently Yarn behavior does not support containers surviving after AM is finished, so the 2nd option is acceptable. This may need to be changed when there'll be support for long-lived apps and attempts which stay alive after AM is stopped. Attaching a patch which implements option #2 and adds a test for it. > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n2.patch, YARN-415--n3.patch, > YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1291) RM INFO logs limit scheduling speed
[ https://issues.apache.org/jira/browse/YARN-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791999#comment-13791999 ] Sandy Ryza commented on YARN-1291: -- I would like to demote the RMContainerImpl state transition log to DEBUG and use an AsyncAppender for the RMAuditLogger (at least make this configurable if not default). [~vinodkv], as these logs are pretty core, wanted to check what your thoughts are on this? > RM INFO logs limit scheduling speed > --- > > Key: YARN-1291 > URL: https://issues.apache.org/jira/browse/YARN-1291 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > I've been running some microbenchmarks to see how fast the Fair Scheduler can > fill up a cluster and found its performance is significantly hampered by > logging. > I tested with 500 (mock) nodes, and found that: > * Taking out fair scheduler INFO logs on the critical path brought down the > latency from 14000 ms to 6000 ms > * Taking out the INFO that RMContainerImpl logs when a container transitions > brought it down from 6000 ms to 4000 ms > * Taking out RMAuditLogger logs brought it down from 4000 ms to 1700 ms -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1291) RM INFO logs limit scheduling speed
Sandy Ryza created YARN-1291: Summary: RM INFO logs limit scheduling speed Key: YARN-1291 URL: https://issues.apache.org/jira/browse/YARN-1291 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza I've been running some microbenchmarks to see how fast the Fair Scheduler can fill up a cluster and found its performance is significantly hampered by logging. I tested with 500 (mock) nodes, and found that: * Taking out fair scheduler INFO logs on the critical path brought down the latency from 14000 ms to 6000 ms * Taking out the INFO that RMContainerImpl logs when a container transitions brought it down from 6000 ms to 4000 ms * Taking out RMAuditLogger logs brought it down from 4000 ms to 1700 ms -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791956#comment-13791956 ] Karthik Kambatla commented on YARN-1241: Looks good to me. > In Fair Scheduler maxRunningApps does not work for non-leaf queues > -- > > Key: YARN-1241 > URL: https://issues.apache.org/jira/browse/YARN-1241 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, > YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241.patch > > > Setting the maxRunningApps property on a parent queue should make it that the > sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791952#comment-13791952 ] Sandy Ryza commented on YARN-1182: -- Nit: {code} +conf.set(YarnConfiguration.RM_ADMIN_ADDRESS, hostname + ":0"); +conf.set(YarnConfiguration.RM_SCHEDULER_ADDRESS, hostname + ":0"); +conf.set(YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS, hostname + ":0"); +WebAppUtils.setRMWebAppHostnameAndPort(getConfig(), hostname, 0); {code} getConfig() should be replaced with conf on the last line, right? Otherwise LGTM > MiniYARNCluster creates and inits the RM/NM only on start() > --- > > Key: YARN-1182 > URL: https://issues.apache.org/jira/browse/YARN-1182 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1182-1.patch > > > MiniYARNCluster creates and inits the RM/NM only on start(). It should create > and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1290) Let continuous scheduling achieve more balanced task assignment
Wei Yan created YARN-1290: - Summary: Let continuous scheduling achieve more balanced task assignment Key: YARN-1290 URL: https://issues.apache.org/jira/browse/YARN-1290 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks. We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791695#comment-13791695 ] Hadoop QA commented on YARN-1182: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607835/yarn-1182-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2160//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2160//console This message is automatically generated. > MiniYARNCluster creates and inits the RM/NM only on start() > --- > > Key: YARN-1182 > URL: https://issues.apache.org/jira/browse/YARN-1182 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1182-1.patch > > > MiniYARNCluster creates and inits the RM/NM only on start(). It should create > and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()
[ https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1182: --- Attachment: yarn-1182-1.patch Straight-forward patch that moves creation and init to serviceInit(). Ran a couple of tests that use MiniYARNCluster. Submitting patch to see if Jenkins finds any other issues. > MiniYARNCluster creates and inits the RM/NM only on start() > --- > > Key: YARN-1182 > URL: https://issues.apache.org/jira/browse/YARN-1182 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1182-1.patch > > > MiniYARNCluster creates and inits the RM/NM only on start(). It should create > and init() during init() itself. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791525#comment-13791525 ] Hudson commented on YARN-1284: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1574 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1574/]) Amending yarn CHANGES.txt moving YARN-1284 to 2.2.1 (tucu: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530716) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > LCE: Race condition leaves dangling cgroups entries for killed containers > - > > Key: YARN-1284 > URL: https://issues.apache.org/jira/browse/YARN-1284 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.2.1 > > Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, > YARN-1284.patch, YARN-1284.patch > > > When LCE & cgroups are enabled, when a container is is killed (in this case > by its owning AM, an MRAM) it seems to be a race condition at OS level when > doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. > LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, > immediately attempts to clean up the cgroups entry for the container. But > this is failing with an error like: > {code} > 2013-10-07 15:21:24,359 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1381179532433_0016_01_11 is : 143 > 2013-10-07 15:21:24,359 DEBUG > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Processing container_1381179532433_0016_01_11 of type > UPDATE_DIAGNOSTICS_MSG > 2013-10-07 15:21:24,359 DEBUG > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > deleteCgroup: > /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 > 2013-10-07 15:21:24,359 WARN > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > Unable to delete cgroup at: > /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 > {code} > CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM > containers to avoid this problem. it seems this should be done for all > containers. > Still, waiting for extra 500ms seems too expensive. > We should look at a way of doing this in a more 'efficient way' from time > perspective, may be spinning while the deleteCgroup() cannot be done with a > minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791526#comment-13791526 ] Hudson commented on YARN-1283: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1574 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1574/]) YARN-1283. Fixed RM to give a fully-qualified proxy URL for an application so that clients don't need to do scheme-mangling. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530819) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java > Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY > - > > Key: YARN-1283 > URL: https://issues.apache.org/jira/browse/YARN-1283 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.1-beta >Reporter: Yesha Vora >Assignee: Omkar Vinit Joshi > Labels: newbie > Fix For: 2.2.1 > > Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, > YARN-1283.20131008.2.patch, YARN-1283.3.patch > > > After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect > "The url to track the job". > Currently, its printing > http://RM:/proxy/application_1381162886563_0001/ instead > https://RM:/proxy/application_1381162886563_0001/ > http://hostname:8088/proxy/application_1381162886563_0001/ is invalid > hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 > 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/100.00.00.000:8032 > 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 > 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. > Instead, use mapreduce.job.user.name > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. > Instead, use mapreduce.job.jar > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.map.tasks.speculative.execution is deprecated. Instead, use > mapreduce.map.speculative > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is > deprecated. Instead, use mapreduce.job.reduces > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class > is deprecated. Instead, use mapreduce.job.partitioner.class > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.reduce.tasks.speculative.execution is deprecated. Instead, use > mapreduce.reduce.speculative > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.mapoutput.value.class is deprecated. Instead, use > mapreduce.map.output.value.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is > deprecated. Instead, use mapreduce.job.map.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is > deprecated. Instead, use mapreduce.job.name > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class > is deprecated. Instead, use mapreduce.job.inputformat.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapreduce.outputformat.class is deprecated. Instead, use > mapreduce.job.outputformat.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is > deprecated. Instead, use mapreduce.job.maps > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class > is deprecated. Instead, use mapreduce.map.output.key.class > 13/10/
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791527#comment-13791527 ] Hudson commented on YARN-879: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1574 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1574/]) YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. Contributed by Junping Du. (devaraj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java > Fix tests w.r.t o.a.h.y.server.resourcemanager.Application > -- > > Key: YARN-879 > URL: https://issues.apache.org/jira/browse/YARN-879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.2.1 > > Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, > YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch > > > getResources() will return a list of containers that allocated by RM. > However, it is now return null directly. The worse thing is: if LOG.debug is > enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791487#comment-13791487 ] Hudson commented on YARN-879: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1548 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1548/]) YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. Contributed by Junping Du. (devaraj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java > Fix tests w.r.t o.a.h.y.server.resourcemanager.Application > -- > > Key: YARN-879 > URL: https://issues.apache.org/jira/browse/YARN-879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.2.1 > > Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, > YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch > > > getResources() will return a list of containers that allocated by RM. > However, it is now return null directly. The worse thing is: if LOG.debug is > enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791486#comment-13791486 ] Hudson commented on YARN-1283: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1548 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1548/]) YARN-1283. Fixed RM to give a fully-qualified proxy URL for an application so that clients don't need to do scheme-mangling. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530819) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java > Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY > - > > Key: YARN-1283 > URL: https://issues.apache.org/jira/browse/YARN-1283 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.1-beta >Reporter: Yesha Vora >Assignee: Omkar Vinit Joshi > Labels: newbie > Fix For: 2.2.1 > > Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, > YARN-1283.20131008.2.patch, YARN-1283.3.patch > > > After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect > "The url to track the job". > Currently, its printing > http://RM:/proxy/application_1381162886563_0001/ instead > https://RM:/proxy/application_1381162886563_0001/ > http://hostname:8088/proxy/application_1381162886563_0001/ is invalid > hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 > 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/100.00.00.000:8032 > 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 > 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. > Instead, use mapreduce.job.user.name > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. > Instead, use mapreduce.job.jar > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.map.tasks.speculative.execution is deprecated. Instead, use > mapreduce.map.speculative > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is > deprecated. Instead, use mapreduce.job.reduces > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class > is deprecated. Instead, use mapreduce.job.partitioner.class > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.reduce.tasks.speculative.execution is deprecated. Instead, use > mapreduce.reduce.speculative > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.mapoutput.value.class is deprecated. Instead, use > mapreduce.map.output.value.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is > deprecated. Instead, use mapreduce.job.map.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is > deprecated. Instead, use mapreduce.job.name > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class > is deprecated. Instead, use mapreduce.job.inputformat.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapreduce.outputformat.class is deprecated. Instead, use > mapreduce.job.outputformat.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is > deprecated. Instead, use mapreduce.job.maps > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class > is deprecated. Instead, use mapreduce.map.output.key.class > 13/10/07 18:39:4
[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791485#comment-13791485 ] Hudson commented on YARN-1284: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1548 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1548/]) Amending yarn CHANGES.txt moving YARN-1284 to 2.2.1 (tucu: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530716) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > LCE: Race condition leaves dangling cgroups entries for killed containers > - > > Key: YARN-1284 > URL: https://issues.apache.org/jira/browse/YARN-1284 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.2.1 > > Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, > YARN-1284.patch, YARN-1284.patch > > > When LCE & cgroups are enabled, when a container is is killed (in this case > by its owning AM, an MRAM) it seems to be a race condition at OS level when > doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. > LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, > immediately attempts to clean up the cgroups entry for the container. But > this is failing with an error like: > {code} > 2013-10-07 15:21:24,359 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1381179532433_0016_01_11 is : 143 > 2013-10-07 15:21:24,359 DEBUG > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Processing container_1381179532433_0016_01_11 of type > UPDATE_DIAGNOSTICS_MSG > 2013-10-07 15:21:24,359 DEBUG > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > deleteCgroup: > /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 > 2013-10-07 15:21:24,359 WARN > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > Unable to delete cgroup at: > /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 > {code} > CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM > containers to avoid this problem. it seems this should be done for all > containers. > Still, waiting for extra 500ms seems too expensive. > We should look at a way of doing this in a more 'efficient way' from time > perspective, may be spinning while the deleteCgroup() cannot be done with a > minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791385#comment-13791385 ] Hudson commented on YARN-1283: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #358 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/358/]) YARN-1283. Fixed RM to give a fully-qualified proxy URL for an application so that clients don't need to do scheme-mangling. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530819) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java > Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY > - > > Key: YARN-1283 > URL: https://issues.apache.org/jira/browse/YARN-1283 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.1-beta >Reporter: Yesha Vora >Assignee: Omkar Vinit Joshi > Labels: newbie > Fix For: 2.2.1 > > Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, > YARN-1283.20131008.2.patch, YARN-1283.3.patch > > > After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect > "The url to track the job". > Currently, its printing > http://RM:/proxy/application_1381162886563_0001/ instead > https://RM:/proxy/application_1381162886563_0001/ > http://hostname:8088/proxy/application_1381162886563_0001/ is invalid > hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 > 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/100.00.00.000:8032 > 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 > 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. > Instead, use mapreduce.job.user.name > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. > Instead, use mapreduce.job.jar > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.map.tasks.speculative.execution is deprecated. Instead, use > mapreduce.map.speculative > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is > deprecated. Instead, use mapreduce.job.reduces > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class > is deprecated. Instead, use mapreduce.job.partitioner.class > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.reduce.tasks.speculative.execution is deprecated. Instead, use > mapreduce.reduce.speculative > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapred.mapoutput.value.class is deprecated. Instead, use > mapreduce.map.output.value.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is > deprecated. Instead, use mapreduce.job.map.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is > deprecated. Instead, use mapreduce.job.name > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class > is deprecated. Instead, use mapreduce.job.inputformat.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > 13/10/07 18:39:40 INFO Configuration.deprecation: > mapreduce.outputformat.class is deprecated. Instead, use > mapreduce.job.outputformat.class > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is > deprecated. Instead, use mapreduce.job.maps > 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class > is deprecated. Instead, use mapreduce.map.output.key.class > 13/10/07 18:39:40
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791386#comment-13791386 ] Hudson commented on YARN-879: - SUCCESS: Integrated in Hadoop-Yarn-trunk #358 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/358/]) YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. Contributed by Junping Du. (devaraj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java > Fix tests w.r.t o.a.h.y.server.resourcemanager.Application > -- > > Key: YARN-879 > URL: https://issues.apache.org/jira/browse/YARN-879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.2.1 > > Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, > YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch > > > getResources() will return a list of containers that allocated by RM. > However, it is now return null directly. The worse thing is: if LOG.debug is > enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791384#comment-13791384 ] Hudson commented on YARN-1284: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #358 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/358/]) Amending yarn CHANGES.txt moving YARN-1284 to 2.2.1 (tucu: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530716) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > LCE: Race condition leaves dangling cgroups entries for killed containers > - > > Key: YARN-1284 > URL: https://issues.apache.org/jira/browse/YARN-1284 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.2.1 > > Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, > YARN-1284.patch, YARN-1284.patch > > > When LCE & cgroups are enabled, when a container is is killed (in this case > by its owning AM, an MRAM) it seems to be a race condition at OS level when > doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. > LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, > immediately attempts to clean up the cgroups entry for the container. But > this is failing with an error like: > {code} > 2013-10-07 15:21:24,359 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1381179532433_0016_01_11 is : 143 > 2013-10-07 15:21:24,359 DEBUG > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Processing container_1381179532433_0016_01_11 of type > UPDATE_DIAGNOSTICS_MSG > 2013-10-07 15:21:24,359 DEBUG > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > deleteCgroup: > /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 > 2013-10-07 15:21:24,359 WARN > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > Unable to delete cgroup at: > /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 > {code} > CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM > containers to avoid this problem. it seems this should be done for all > containers. > Still, waiting for extra 500ms seems too expensive. > We should look at a way of doing this in a more 'efficient way' from time > perspective, may be spinning while the deleteCgroup() cannot be done with a > minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791356#comment-13791356 ] Hudson commented on YARN-879: - SUCCESS: Integrated in Hadoop-trunk-Commit #4579 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4579/]) YARN-879. Fixed tests w.r.t o.a.h.y.server.resourcemanager.Application. Contributed by Junping Du. (devaraj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1530902) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java > Fix tests w.r.t o.a.h.y.server.resourcemanager.Application > -- > > Key: YARN-879 > URL: https://issues.apache.org/jira/browse/YARN-879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.2.1 > > Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, > YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch > > > getResources() will return a list of containers that allocated by RM. > However, it is now return null directly. The worse thing is: if LOG.debug is > enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791348#comment-13791348 ] Devaraj K commented on YARN-879: +1, Latest patch looks good to me, will commit this shortly. > Fix tests w.r.t o.a.h.y.server.resourcemanager.Application > -- > > Key: YARN-879 > URL: https://issues.apache.org/jira/browse/YARN-879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, > YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch > > > getResources() will return a list of containers that allocated by RM. > However, it is now return null directly. The worse thing is: if LOG.debug is > enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration "yarn.nodemanager.aux-services" should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791335#comment-13791335 ] Hadoop QA commented on YARN-1289: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607730/YARN-1289.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync org.apache.hadoop.yarn.server.nodemanager.TestEventFlow org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2159//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2159//console This message is automatically generated. > Configuration "yarn.nodemanager.aux-services" should have default value for > mapreduce_shuffle. > -- > > Key: YARN-1289 > URL: https://issues.apache.org/jira/browse/YARN-1289 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: wenwupeng >Assignee: Junping Du > Attachments: YARN-1289.patch > > > Failed to run benchmark when not configure yarn.nodemanager.aux-services > value in yarn-site.xml', it is better to configure default value. > 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : > attempt_1381371516570_0001_m_00_1, Status : FAILED > Container launch failed for container_1381371516570_0001_01_05 : > org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The > auxService:mapreduce_shuffle does not exist > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791313#comment-13791313 ] Junping Du commented on YARN-7: --- Thanks Luke for review and comments! > Add support for DistributedShell to ask for CPUs along with memory > -- > > Key: YARN-7 > URL: https://issues.apache.org/jira/browse/YARN-7 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.1-beta >Reporter: Arun C Murthy >Assignee: Junping Du > Labels: patch > Attachments: YARN-7.patch, YARN-7-v2.patch, YARN-7-v3.patch, > YARN-7-v4.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)