[jira] [Commented] (YARN-1458) hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836372#comment-13836372 ] qingwu.fu commented on YARN-1458: - We have test our suspicious of points, and it doesn't work. We will focus on handle 0 values of ComputeFairShares#computeShares returnning. How about if it return 0 we just count it's weight just as the situation that sizebasedweight is true. That's mean, if the it return 0, we can set it's weight 1. hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.
[ https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rekha Joshi updated YARN-1019: -- Attachment: YARN-1019.0.patch YarnConfiguration validation for local disk path and http addresses. Key: YARN-1019 URL: https://issues.apache.org/jira/browse/YARN-1019 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Omkar Vinit Joshi Priority: Minor Labels: newbie Attachments: YARN-1019.0.patch Today we are not validating certain configuration parameters set in yarn-site.xml. 1) Configurations related to paths... such as local-dirs, log-dirs.. Our NM crashes during startup if they are set to relative paths rather than absolute paths. To avoid such failures we can enforce checks (absolute paths) before startup . i.e. before we actually startup...( i.e. directory handler creating directories). 2) Also for all the parameters using hostname:port unless we are ok with default port. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.
[ https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836397#comment-13836397 ] Hadoop QA commented on YARN-1019: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616524/YARN-1019.0.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2567//console This message is automatically generated. YarnConfiguration validation for local disk path and http addresses. Key: YARN-1019 URL: https://issues.apache.org/jira/browse/YARN-1019 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Omkar Vinit Joshi Priority: Minor Labels: newbie Attachments: YARN-1019.0.patch Today we are not validating certain configuration parameters set in yarn-site.xml. 1) Configurations related to paths... such as local-dirs, log-dirs.. Our NM crashes during startup if they are set to relative paths rather than absolute paths. To avoid such failures we can enforce checks (absolute paths) before startup . i.e. before we actually startup...( i.e. directory handler creating directories). 2) Also for all the parameters using hostname:port unless we are ok with default port. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs
[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836410#comment-13836410 ] Steve Loughran commented on YARN-1390: -- # Some limits on tag size is going to be needed, obviously. If AMs can update tag data they can use it as a store of information, which would be convenient and dangerous. # app metadata is visible to all so users need to be reminded to limit what they say Provide a way to capture source of an application to be queried through REST or Java Client APIs Key: YARN-1390 URL: https://issues.apache.org/jira/browse/YARN-1390 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla In addition to other fields like application-type (added in YARN-563), it is useful to have an applicationSource field to track the source of an application. The application source can be useful in (1) fetching only those applications a user is interested in, (2) potentially adding source-specific optimizations in the future. Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836701#comment-13836701 ] Zhijie Shen commented on YARN-1445: --- 1. I agree on the most plans about dealing with FINISHING. bq. 3.WebAppProxyServlet::doGet(). I think we might need to handle FINISHING as well as FINISHED. After RMApp enters FINISHING, the AM has already been unregistered, and tracking url has already been updated. Therefore, we're able to redirect the request in this state. bq. 5.ClientServiceDelegate::getProxy(), we might need to handle YarnApplicationState.FINISHING, too. When the application is in FINISHING, the AM has already been unregistered, and MR job need to be redirected to JHS for the details. bq. 6.ApplicationCLI::killApplication(). Here is a question mark. We can kill the AM when the RMApp is at Finishing state, since the AM did not really exist. Just for DS and MR, when the AM did the unregisterApplicationMaster, that means the application is finished, so at this time, if we send kill event, it is meaningless. Here, I just handle the YarnApplicationState.Finishing and Finished with the same way. I've the different opinion here. RMAppImpl actually allows killing the app when it is in FINISHING. Moreover, if we handle FINISHING in the same way as FINISHED, we will see that printApplicationReport tells the app is still in FINISHING, while killApplication says the app is finished. Thoughts? 2. TestYarnClient#testSubmitApplication need to be changed accordingly as well. Would you please double check other test classes where FINISHED is referred, and check whether the test cases will be broken or not? 3. Is it good to document the difference in detail between FINISHING and FINISHED? Separate FINISHING and FINISHED state in YarnApplicationState - Key: YARN-1445 URL: https://issues.apache.org/jira/browse/YARN-1445 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1445.1.patch, YARN-1445.2.patch Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to YarnApplicationState.FINISHED. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned YARN-1463: --- Assignee: Binglin Chang TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836713#comment-13836713 ] Binglin Chang commented on YARN-1463: - HDFS-5545 introduced this bug: when decide whether to init spnego, original code logic is broken TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated YARN-1463: Attachment: YARN-1463.v1.patch Attach patch with simple fix, the test can succeed now. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs
[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836723#comment-13836723 ] Alejandro Abdelnur commented on YARN-1390: -- Agree with Steve, we should limit the length of a tag and number of tags. I'd suggest going hardcoded for now, i.e. 50chars/10tags and going configurable later if the need arises. Provide a way to capture source of an application to be queried through REST or Java Client APIs Key: YARN-1390 URL: https://issues.apache.org/jira/browse/YARN-1390 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla In addition to other fields like application-type (added in YARN-563), it is useful to have an applicationSource field to track the source of an application. The application source can be useful in (1) fetching only those applications a user is interested in, (2) potentially adding source-specific optimizations in the future. Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836734#comment-13836734 ] Hadoop QA commented on YARN-1463: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616579/YARN-1463.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2568//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2568//console This message is automatically generated. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1446) Change killing application to wait until state store is done
[ https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836740#comment-13836740 ] Zhijie Shen commented on YARN-1446: --- Hm... I see. The patch is generally good. I've the following comments. 1. It's better to use KillApplicationRequest.newInstance {code} +KillApplicationRequest req = +Records.newRecord(KillApplicationRequest.class); {code} 2. Fix the grammar bellow {code} + // reaches killed state.and also check that attempt state is saved before app {code} 3. Ago, app can be killed at FINISHING. With the following change in ClientRMSerivce, it seems to be no longer applicable. {code} +if (application.isAppSafeToTerminate()) { + return KillApplicationResponse.newInstance(true); +} {code} 4. Instead of logging the killing info every 100ms, how about doing something similar in YarnClientImpl#submitApplication? 5. Do you have an estimation on the number of KILLING requests that is sent before KILLING is succeeded? 6. Does this ticket a bit overlap YARN-261? After the change, it is actually killing the attempt instead of the app, but we doesn't allow retry here. Change killing application to wait until state store is done Key: YARN-1446 URL: https://issues.apache.org/jira/browse/YARN-1446 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1458) hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836747#comment-13836747 ] Sandy Ryza commented on YARN-1458: -- If size based weight is turned on and an app has 0 demand, I think giving it 0 fair share is the correct thing to do. I.e., if there are two apps and one has 0 demand, the other app should get the entire share. We just need to handle the special case where all apps in a queue have 0 weight and make it so that this does not result in an infinite loop in the computeShares method. hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1458: - Summary: In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely (was: hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely -- Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1458: - Description: The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: {code} ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) {code} was: The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836804#comment-13836804 ] Zhijie Shen commented on YARN-967: -- The aforementioned issues are fixed in the last patch [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1462: --- Summary: AHS API and other AHS changes to handle tags for completed MR jobs (was: AHS API and JHS changes to handle tags for completed MR jobs) AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1465) define and add shared constants and utilities for the shared cache
Sangjin Lee created YARN-1465: - Summary: define and add shared constants and utilities for the shared cache Key: YARN-1465 URL: https://issues.apache.org/jira/browse/YARN-1465 Project: Hadoop YARN Issue Type: New Feature Reporter: Sangjin Lee Assignee: Sangjin Lee -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs
[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836823#comment-13836823 ] Steve Loughran commented on YARN-1390: -- oh and restrict the tag names to stuff that works well in URLs Provide a way to capture source of an application to be queried through REST or Java Client APIs Key: YARN-1390 URL: https://issues.apache.org/jira/browse/YARN-1390 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla In addition to other fields like application-type (added in YARN-563), it is useful to have an applicationSource field to track the source of an application. The application source can be useful in (1) fetching only those applications a user is interested in, (2) potentially adding source-specific optimizations in the future. Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1466) implement the cleaner service for the shared cache
Sangjin Lee created YARN-1466: - Summary: implement the cleaner service for the shared cache Key: YARN-1466 URL: https://issues.apache.org/jira/browse/YARN-1466 Project: Hadoop YARN Issue Type: New Feature Reporter: Sangjin Lee Assignee: Sangjin Lee -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1467) implement checksum verification for resource localization service for the shared cache
Sangjin Lee created YARN-1467: - Summary: implement checksum verification for resource localization service for the shared cache Key: YARN-1467 URL: https://issues.apache.org/jira/browse/YARN-1467 Project: Hadoop YARN Issue Type: New Feature Reporter: Sangjin Lee Assignee: Sangjin Lee -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836829#comment-13836829 ] Karthik Kambatla commented on YARN-1399: From discussion on YARN-1390: The tags can be a list of Strings with limits on the number of tags (hardcoded to 10 for now) and what goes in the tag (50 characters that behave well in URLs). Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs
[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836831#comment-13836831 ] Karthik Kambatla commented on YARN-1390: Agree with Steve and Alejandro. Copied the gist to YARN-1399. Provide a way to capture source of an application to be queried through REST or Java Client APIs Key: YARN-1390 URL: https://issues.apache.org/jira/browse/YARN-1390 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla In addition to other fields like application-type (added in YARN-563), it is useful to have an applicationSource field to track the source of an application. The application source can be useful in (1) fetching only those applications a user is interested in, (2) potentially adding source-specific optimizations in the future. Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1445: Attachment: YARN-1445.3.patch Separate FINISHING and FINISHED state in YarnApplicationState - Key: YARN-1445 URL: https://issues.apache.org/jira/browse/YARN-1445 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to YarnApplicationState.FINISHED. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836841#comment-13836841 ] Xuan Gong commented on YARN-1445: - bq. I've the different opinion here. RMAppImpl actually allows killing the app when it is in FINISHING. Moreover, if we handle FINISHING in the same way as FINISHED, we will see that printApplicationReport tells the app is still in FINISHING, while killApplication says the app is finished. Thoughts? Make sense bq. 2. TestYarnClient#testSubmitApplication need to be changed accordingly as well. Would you please double check other test classes where FINISHED is referred, and check whether the test cases will be broken or not? Good catch. Fixed. For other places, I think they are fine. I did the full run on all the test for hadoop-yarn project. bq. 3. Is it good to document the difference in detail between FINISHING and FINISHED Added Separate FINISHING and FINISHED state in YarnApplicationState - Key: YARN-1445 URL: https://issues.apache.org/jira/browse/YARN-1445 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to YarnApplicationState.FINISHED. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836847#comment-13836847 ] Zhijie Shen commented on YARN-1399: --- I agree we should have the limits for the number of tags and the length of them. Either they are configured or hardcoded, IMHO, we should expose the information to users. For example, if the user input a tag which is too long to be accepted, RM should return with a suitable exception. In addition, I think It's also good to regulate the charset that the tag can use, avoid users to input some strange characters. Moreover, in general, IMHO, user name, queue name, application name and application type should be regulated as well. Thoughts. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1446) Change killing application to wait until state store is done
[ https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836857#comment-13836857 ] Jian He commented on YARN-1446: --- Thanks zhijie for the review. bq. Ago, app can be killed at FINISHING. With the following change in ClientRMSerivce, it seems to be no longer applicable. Investigate more, I think kill event can be ignored when App is at finishing state, because attempt is anyways ignoring the kill event at Finishing. Here, we made a decision that an application that has called unregistered even still remaining at Final_Saving state is not killable, sounds reasonable ? Updated the patch accordingly. bq. Do you have an estimation on the number of KILLING requests that is sent before KILLING is succeeded? Experimented on single node cluster, on average it's sending 2 requests. Added one more check in isAppSafeToTerminate() method that if recovery is not enabled, just return true. bq. Does this ticket a bit overlap YARN-261 Just took a quick look at the patch of that jira, that jira may still be needed, as that jira is actually doing adding a functionality to manually failing the attempt not killing the attempt. Fixed other comments also. Change killing application to wait until state store is done Key: YARN-1446 URL: https://issues.apache.org/jira/browse/YARN-1446 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1446) Change killing application to wait until state store is done
[ https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1446: -- Attachment: YARN-1446.2.patch Change killing application to wait until state store is done Key: YARN-1446 URL: https://issues.apache.org/jira/browse/YARN-1446 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.2.patch When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-291) [Umbrella] Dynamic resource configuration
[ https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836883#comment-13836883 ] Cindy Li commented on YARN-291: --- Junping, just saw your comments on YARN-999 . I can help on it. Can you help me understand the use cases/scope of YARN-999 besides graceful decommission. In the code below: // TODO process resource over-commitment case (allocated containers // total capacity) in different option by getting value of // overCommitTimeoutMillis. By different options above, do you mean overCommitTimeoutMills 0, = 0, 0 ? I want to find out more use cases associated with this setting besides graceful decommission. For example, you mentioned preemption for long running tasks in YARN-999, is that part of or a different use case from graceful decommission? Also, about the August patch CoreAndAdmin.patch (in YARN-291) , can you let us know your plan about it because it seems useful for graceful decommission from outside of YARN code. Thanks, [Umbrella] Dynamic resource configuration - Key: YARN-291 URL: https://issues.apache.org/jira/browse/YARN-291 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: Elastic Resources for YARN-v0.2.pdf, YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, YARN-291-CoreAndAdmin.patch, YARN-291-JMXInterfaceOnNM-02.patch, YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, YARN-291-YARNClientCommandline-04.patch, YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch The current Hadoop YARN resource management logic assumes per node resource is static during the lifetime of the NM process. Allowing run-time configuration on per node resource will give us finer granularity of resource elasticity. This allows Hadoop workloads to coexist with other workloads on the same hardware efficiently, whether or not the environment is virtualized. More background and design details can be found in attached proposal. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1287) Consolidate MockClocks
[ https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836892#comment-13836892 ] Sebastian Wong commented on YARN-1287: -- Where should the new MockClock class be placed directory wise? Consolidate MockClocks -- Key: YARN-1287 URL: https://issues.apache.org/jira/browse/YARN-1287 Project: Hadoop YARN Issue Type: Improvement Reporter: Sandy Ryza Labels: newbie A bunch of different tests have near-identical implementations of MockClock. TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for example. They should be consolidated into a single MockClock. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836894#comment-13836894 ] Xuan Gong commented on YARN-1028: - bq. It might appear so, but the actual wait time is controlled by ipc.client.connect.max.retries, which is 10 seconds by default. Verified it on a cluster. Yes, you are right. It is controlled by ipc.client.connect.max.retries in this case. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1446) Change killing application to wait until state store is done
[ https://issues.apache.org/jira/browse/YARN-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836918#comment-13836918 ] Hadoop QA commented on YARN-1446: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616610/YARN-1446.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2570//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2570//console This message is automatically generated. Change killing application to wait until state store is done Key: YARN-1446 URL: https://issues.apache.org/jira/browse/YARN-1446 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.1.patch, YARN-1446.2.patch When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1442) change yarn minicluster base directory via system property
[ https://issues.apache.org/jira/browse/YARN-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836941#comment-13836941 ] Mark Miller commented on YARN-1442: --- +1 - The Apache Solr project runs Yarn in it's tests and currently has to duplicate a bunch of Yarn mini cluster code to work around this issue. change yarn minicluster base directory via system property -- Key: YARN-1442 URL: https://issues.apache.org/jira/browse/YARN-1442 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: André Kelpe Priority: Minor Attachments: HADOOP-10122.patch The yarn minicluster used for testing uses the target directory by default. We use gradle for building our projects and we would like to see it using a different directory. This patch makes it possible to use a different directory by setting the yarn.minicluster.directory system property. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-895: - Summary: RM crashes if it restarts while the state-store is down (was: RM crashes if it restarts while NameNode is in safe mode) RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836960#comment-13836960 ] Hadoop QA commented on YARN-1445: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616603/YARN-1445.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site: org.apache.hadoop.mapreduce.security.TestJHSSecurity {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2569//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2569//console This message is automatically generated. Separate FINISHING and FINISHED state in YarnApplicationState - Key: YARN-1445 URL: https://issues.apache.org/jira/browse/YARN-1445 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to YarnApplicationState.FINISHED. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1459) Handle supergroups, usergroups and ACLs across RMs during failover
[ https://issues.apache.org/jira/browse/YARN-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836966#comment-13836966 ] Vinod Kumar Vavilapalli commented on YARN-1459: --- As I was trying to indicate [here|https://issues.apache.org/jira/browse/YARN-1318?focusedCommentId=13834101page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13834101], we may have to think about completely moving them off the local disk, but it radically changes the operator workflow. Today admins edit those files separately, we'll have to move towards CLI tools completely for this to happen. Handle supergroups, usergroups and ACLs across RMs during failover -- Key: YARN-1459 URL: https://issues.apache.org/jira/browse/YARN-1459 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla The supergroups, usergroups and ACL configurations are per RM and might have been changed while the RM is running. After failing over, the new Active RM should have the latest configuration from the previously Active RM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836975#comment-13836975 ] Karthik Kambatla commented on YARN-1399: bq. Moreover, in general, IMHO, user name, queue name, application name and application type should be regulated as well. This would be an incompatible change, and we should probably avoid it if possible. This brings up another interesting issue of handling applicationTypes as a special kind of tags when we get to that. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836980#comment-13836980 ] Alejandro Abdelnur commented on YARN-1399: -- What is the concern for a tag being a valid unicode string? If queried via rest API the values would be urlencoded, thus no harm. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836987#comment-13836987 ] Haohui Mai commented on YARN-1463: -- Can you please explain why it is broken? -- Jenkins does not complain at HDFS-5545. I don't quite get what this patch changes -- it seems to me that the same case is covered by HttpServer#initSpnego(). TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService
[ https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836999#comment-13836999 ] Hudson commented on YARN-1318: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4817 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4817/]) YARN-1318. Promoted AdminService to an Always-On service and merged it into RMHAProtocolService. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1547212) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java Promote AdminService to an Always-On service and merge in RMHAProtocolService - Key: YARN-1318 URL: https://issues.apache.org/jira/browse/YARN-1318 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Labels: ha Fix For: 2.4.0 Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, yarn-1318-2.patch, yarn-1318-3.patch, yarn-1318-4.patch, yarn-1318-4.patch, yarn-1318-5.patch, yarn-1318-6.patch Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837002#comment-13837002 ] Mayank Bansal commented on YARN-967: [~vinodkv] sorry missed your comments Attaching latest patch, Thanks, Mayank [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-967: --- Attachment: YARN-967-12.patch [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837011#comment-13837011 ] Hadoop QA commented on YARN-967: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616622/YARN-967-12.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2571//console This message is automatically generated. [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837020#comment-13837020 ] Zhijie Shen commented on YARN-1399: --- bq. What is the concern for a tag being a valid unicode string? If queried via rest API the values would be urlencoded, thus no harm. For example, do we want to support the multiple words in a tag, such as distributed systems? It doesn't have the problem when we do exact match for searching via tags? However, if we want somewhat fuzzy match, we may need to take care of splitting word. For user/queue/applicationType, we may want to them be lowercase/uppercase (or be converted to lowercase/uppercase), thus being insensitive. Also, it's good to ignore some characters, such as ?!/={} and etc. Thoughts? Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837027#comment-13837027 ] Jian He commented on YARN-895: -- Fixed the comments bq. Test: In the HDFS test, you don't wait for any time at all for the client to get exceptions? clientThread.join() is called for waiting client to get exceptions, the test fails if retry is disabled. and pass if retry is enabled. RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-895: - Attachment: YARN-895.4.patch RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1147) Add end-to-end tests for HA
[ https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1147: --- Assignee: Xuan Gong Add end-to-end tests for HA --- Key: YARN-1147 URL: https://issues.apache.org/jira/browse/YARN-1147 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.4.0 While individual sub-tasks add tests for the code they include, it will be handy to write end-to-end tests for HA including some stress testing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1459) Handle supergroups, usergroups and ACLs across RMs during failover
[ https://issues.apache.org/jira/browse/YARN-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1459: --- Assignee: Xuan Gong Handle supergroups, usergroups and ACLs across RMs during failover -- Key: YARN-1459 URL: https://issues.apache.org/jira/browse/YARN-1459 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong The supergroups, usergroups and ACL configurations are per RM and might have been changed while the RM is running. After failing over, the new Active RM should have the latest configuration from the previously Active RM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1325) RMHAProtocolService#serviceInit should check configuration contains multiple RM
[ https://issues.apache.org/jira/browse/YARN-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1325: --- Assignee: Xuan Gong RMHAProtocolService#serviceInit should check configuration contains multiple RM --- Key: YARN-1325 URL: https://issues.apache.org/jira/browse/YARN-1325 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Xuan Gong Labels: ha Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS). This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs. One idea is to support strict mode to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1410: --- Assignee: Xuan Gong Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1125) Add shutdown support to non-service RM components
[ https://issues.apache.org/jira/browse/YARN-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1125: --- Assignee: Xuan Gong Add shutdown support to non-service RM components - Key: YARN-1125 URL: https://issues.apache.org/jira/browse/YARN-1125 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Xuan Gong The ResourceManager has certain non-service components like the Scheduler. While transitioning to standby, these components should be completely turned off. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837031#comment-13837031 ] Alejandro Abdelnur commented on YARN-1399: -- I would stick to exact tag match. Case insensitive seems reasonable, though I would implement it by lowercase or upper case tags on arrival and when querying. Then the matching is the cheapest. Regarding symbols, what is the harm in supporting them? One thing we didn't mentioned before, on querying I would support only OR, then the client must do any further filtering if it wants to do AND. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837040#comment-13837040 ] Vinod Kumar Vavilapalli commented on YARN-1463: --- After YARN-1318, the exception message reported is {code} 2013-12-02 22:49:34,492 INFO [Thread-322] service.AbstractService (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed in state STARTED; cause: java.lang.NullPointerException java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:334) at java.util.Properties.getProperty(Properties.java:932) at org.apache.hadoop.conf.Configuration.get(Configuration.java:874) at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892) at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101) at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:826) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:477) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:850) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:205) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:118) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:880) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) {code} Haohui/Binglin, can you see if this can be fixed in common itself? If that is the case, we can avoid these YARN specific changes. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837050#comment-13837050 ] qingwu.fu commented on YARN-1458: - Thanks Sandy. We were confused by your point that If it returns 0 we should just set the fair shares of all the considered schedulables to 0.. In our understanding, you suggested to set all app's weight to 0 when one app's weight is 0. So we proposed the idea above. But now we agree with the point that If size based weight is turned on and an app has 0 demand, I think giving it 0 fair share is the correct thing to do.. It's more precise to the principle of FairShare. In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely -- Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: {code} ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1181) Augment MiniYARNCluster to support HA mode
[ https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837051#comment-13837051 ] Karthik Kambatla commented on YARN-1181: The failing tests are unrelated - YARN-1463 and YARN-1464 respectively. Augment MiniYARNCluster to support HA mode -- Key: YARN-1181 URL: https://issues.apache.org/jira/browse/YARN-1181 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1181-1.patch, yarn-1181-2.patch, yarn-1181-3.patch MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-895: - Attachment: YARN-895.4.patch Missed the change in yarn-default.xml RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837054#comment-13837054 ] Mayank Bansal commented on YARN-967: Thanks [~vinodkv] for review. Updated the java docs. Thanks, Mayank [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, YARN-967-12.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-967: --- Attachment: YARN-967-13.patch [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, YARN-967-12.patch, YARN-967-13.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837059#comment-13837059 ] Hadoop QA commented on YARN-895: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616625/YARN-895.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2572//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2572//console This message is automatically generated. RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837057#comment-13837057 ] Hadoop QA commented on YARN-967: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616629/YARN-967-13.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2574//console This message is automatically generated. [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, YARN-967-12.patch, YARN-967-13.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837064#comment-13837064 ] Haohui Mai commented on YARN-1463: -- After the discussion with [~vinodkv], this is due to the code does conf.get() for spnegokey / keytabkey for twice. The following patch should fix the problem: {code} if (hasSpnegoConf) { - builder.setUsernameConfKey(conf.get(spnegoPrincipalKey)) - .setKeytabConfKey(conf.get(spnegoKeytabKey)) + builder.setUsernameConfKey(spnegoPrincipalKey) + .setKeytabConfKey(spnegoKeytabKey) .setSecurityEnabled(UserGroupInformation.isSecurityEnabled()); } {code} [~decster], I believe that the null pointer checks are redundant as HttpServer has already covered them. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837082#comment-13837082 ] Hadoop QA commented on YARN-895: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616628/YARN-895.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2573//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2573//console This message is automatically generated. RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated YARN-1463: - Attachment: YARN-1463.000.patch TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-895: - Attachment: YARN-895.5.patch Misundertood the comment. Update the patch to sleep some time for client to get exceptions RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, YARN-895.5.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1181) Augment MiniYARNCluster to support HA mode
[ https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1181: --- Attachment: yarn-1181-4.patch Rebased on trunk post YARN-1318. Augment MiniYARNCluster to support HA mode -- Key: YARN-1181 URL: https://issues.apache.org/jira/browse/YARN-1181 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1181-1.patch, yarn-1181-2.patch, yarn-1181-3.patch, yarn-1181-4.patch MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837144#comment-13837144 ] Karthik Kambatla commented on YARN-1028: In RMProxy, we build an exceptionPolicyMap and handle a couple of Exceptions. Is there a particular reason for this? In other words, are there any Exceptions we don't want the default retryPolicy to handle? {code} MapClass? extends Exception, RetryPolicy exceptionToPolicyMap = new HashMapClass? extends Exception, RetryPolicy(); exceptionToPolicyMap.put(ConnectException.class, retryPolicy); //TO DO: after HADOOP-9576, IOException can be changed to EOFException exceptionToPolicyMap.put(IOException.class, retryPolicy); return RetryPolicies.retryByException(RetryPolicies.TRY_ONCE_THEN_FAIL, exceptionToPolicyMap); {code} In the context of this JIRA, we have one RetryPolicy for the HA case and another for the non-HA case. We ll probably have to add different exceptions based on whether HA is enabled or not. Wondering if it is really required. [~xgong], [~jianhe] - thoughts? Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1125) Add shutdown support to non-service RM components
[ https://issues.apache.org/jira/browse/YARN-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837179#comment-13837179 ] Tsuyoshi OZAWA commented on YARN-1125: -- [~xgong], could you wait for taking this? Before doing this JIRA, we need to deal with YARN-1139, YARN-1172, HADOOP-10043. I'm waiting for the review. [~kkambatl], could you help me to advance HADOOP-10043? Add shutdown support to non-service RM components - Key: YARN-1125 URL: https://issues.apache.org/jira/browse/YARN-1125 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Xuan Gong The ResourceManager has certain non-service components like the Scheduler. While transitioning to standby, these components should be completely turned off. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1181) Augment MiniYARNCluster to support HA mode
[ https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837183#comment-13837183 ] Hadoop QA commented on YARN-1181: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616645/yarn-1181-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.TestRMNMSecretKeys {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2575//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2575//console This message is automatically generated. Augment MiniYARNCluster to support HA mode -- Key: YARN-1181 URL: https://issues.apache.org/jira/browse/YARN-1181 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1181-1.patch, yarn-1181-2.patch, yarn-1181-3.patch, yarn-1181-4.patch MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837185#comment-13837185 ] Hadoop QA commented on YARN-1463: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616643/YARN-1463.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2576//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2576//console This message is automatically generated. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate
[ https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1301: - Attachment: YARN-1301.5.patch Sorry for delay, updated a patch to check whether blacklist additions/removals are null Need to log the blacklist additions/removals when YarnSchedule#allocate --- Key: YARN-1301 URL: https://issues.apache.org/jira/browse/YARN-1301 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Priority: Minor Fix For: 2.4.0 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, YARN-1301.4.patch, YARN-1301.5.patch Now without the log, it's hard to debug whether blacklist is updated on the scheduler side or not -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA
[ https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837200#comment-13837200 ] Tsuyoshi OZAWA commented on YARN-1307: -- *ping* any comments are welcome. Rethink znode structure for RM HA - Key: YARN-1307 URL: https://issues.apache.org/jira/browse/YARN-1307 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch, YARN-1307.4-2.patch, YARN-1307.4-3.patch, YARN-1307.4.patch, YARN-1307.5.patch, YARN-1307.6.patch, YARN-1307.7.patch, YARN-1307.8.patch Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222: {quote} We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore. {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837201#comment-13837201 ] Hadoop QA commented on YARN-895: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616644/YARN-895.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2577//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2577//console This message is automatically generated. RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, YARN-895.5.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837211#comment-13837211 ] Jian He commented on YARN-1028: --- The assumption was to retry some connection related exceptions, maybe later on some other type of exceptions. I can find one exception ApplicationNotFoundException, which should not be retried by the client. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-895) RM crashes if it restarts while the state-store is down
[ https://issues.apache.org/jira/browse/YARN-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837212#comment-13837212 ] Jian He commented on YARN-895: -- test failure not related RM crashes if it restarts while the state-store is down --- Key: YARN-895 URL: https://issues.apache.org/jira/browse/YARN-895 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-895.1.patch, YARN-895.2.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.3.patch, YARN-895.4.patch, YARN-895.4.patch, YARN-895.5.patch, YARN-895.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837218#comment-13837218 ] Xuan Gong commented on YARN-1028: - Yes, I agree with [~jianhe]. We did make such assumption. Basically, whether or which retry policy it will choose is based on the exceptions. In HA case, I think that we do not need to wrap with RetryPolicies.retryByException. Just directly return with RetryPolicies.failoverOnNetworkException should be enough. Otherwise, we will only retry for connectionException and IOException. But if we directly use RetryPolicies.failoverOnNetworkException, it will consider much more exceptions. Take a look at FailoverOnNetworkExceptionRetry::shouldRetry(). we can find more exceptions that this retryPolicy can handle. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate
[ https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837222#comment-13837222 ] Hadoop QA commented on YARN-1301: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616668/YARN-1301.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2578//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2578//console This message is automatically generated. Need to log the blacklist additions/removals when YarnSchedule#allocate --- Key: YARN-1301 URL: https://issues.apache.org/jira/browse/YARN-1301 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Priority: Minor Fix For: 2.4.0 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, YARN-1301.4.patch, YARN-1301.5.patch Now without the log, it's hard to debug whether blacklist is updated on the scheduler side or not -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-291) [Umbrella] Dynamic resource configuration
[ https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837238#comment-13837238 ] Junping Du commented on YARN-291: - Junping, just saw your comments on YARN-999 . I can help on it. Thanks! I plan to finish option without timeout in Dec, so it would be great for you to help on timeout part. By different options above, do you mean overCommitTimeoutMills 0, = 0, 0 ? I want to find out more use cases associated with this setting besides graceful decommission. For example, you mentioned preemption for long running tasks in YARN-999, is that part of or a different use case from graceful decommission? Yes. overCommitTimeoutMills value sets different options here. 0 (or just -1) means we tolerant tasks running to the end even under resource over-consumed cases; =0 means we only tolerant a few time specified in overCommitTimeoutMills. Once timeout, we do aggressive ways (i.e. preemption on assigned containers with frozen or kill tasks) to reclaim resources so that NM's resource can get it balanced again. Graceful decommission is just a special case for this where we always set NM's totalResource to 0 first, so all assigned containers will get released after a timeout (except timeout = -1). If we can set a proper timeout value here, then it will get chance for NM to finish running tasks with intermediate map output get retrieval before decommissioned and that's why we call it graceful. Also, about the August patch CoreAndAdmin.patch (in YARN-291) , can you let us know your plan about it because it seems useful for graceful decommission from outside of YARN code. Most of patches are on the track. YARN-311 (core changes) get checked in, YARN-312 (RPC) get reviewed with +1. Will be there soon. Cheers, [Umbrella] Dynamic resource configuration - Key: YARN-291 URL: https://issues.apache.org/jira/browse/YARN-291 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: Elastic Resources for YARN-v0.2.pdf, YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, YARN-291-CoreAndAdmin.patch, YARN-291-JMXInterfaceOnNM-02.patch, YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, YARN-291-YARNClientCommandline-04.patch, YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch The current Hadoop YARN resource management logic assumes per node resource is static during the lifetime of the NM process. Allowing run-time configuration on per node resource will give us finer granularity of resource elasticity. This allows Hadoop workloads to coexist with other workloads on the same hardware efficiently, whether or not the environment is virtualized. More background and design details can be found in attached proposal. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qingwu.fu updated YARN-1458: Attachment: YARN-1458.patch In the Fair Scheduler, if size based weight is turned on, it will lead to endless loop in ComputeFairShares.computeShares (ComputeFairShares.java:102) that if all app's require resource in one queue is 0. This patch deals with that situation, we let the program jump out of the loop if all app's require resource of one queue is 0. That means set that queue's require resource to 0 In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely -- Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Fix For: 2.2.1 Attachments: YARN-1458.patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: {code} ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837251#comment-13837251 ] Binglin Chang commented on YARN-1463: - Hi Haohui, I originally did the same as your patch did, but it still failed with other errors on my Macbook pro. So I add more checks, just as the original code did, and it now passed. {code} Running org.apache.hadoop.yarn.server.TestContainerManagerSecurity Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.663 sec FAILURE! - in org.apache.hadoop.yarn.server.TestContainerManagerSecurity testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.735 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837257#comment-13837257 ] Binglin Chang commented on YARN-1463: - Detail log: 2013-12-03 10:30:44,577 WARN [Thread-321] mortbay.log (Slf4jLog.java:warn(89)) - Failed startup of context org.mortbay.jetty.webapp.WebAppContext@9ba0281{/,file:/Users/decster/projects/hadoop-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/classes/webapps/cluster} javax.servlet.ServletException: javax.servlet.ServletException: Principal not defined in configuration at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:203) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:146) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:914) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:245) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:820) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:471) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:844) at org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.transitionToActive(RMHAProtocolService.java:187) at org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceStart(RMHAProtocolService.java:101) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:871) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper$3.run(MiniYARNCluster.java:242) TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837255#comment-13837255 ] qingwu.fu commented on YARN-1458: - Hi Sandy, We have submitted the patch: YARN-1458.patch . Please help us reviewing it. Nice to work with you. Thank you so much! In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely -- Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Fix For: 2.2.1 Attachments: YARN-1458.patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: {code} ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837258#comment-13837258 ] Binglin Chang commented on YARN-1463: - We can see from the code, HttpServer does not cover null check for keys {code} private void initSpnego(Configuration conf, String hostName, String usernameConfKey, String keytabConfKey) throws IOException { MapString, String params = new HashMapString, String(); String principalInConf = conf.get(usernameConfKey); if (principalInConf != null !principalInConf.isEmpty()) { params.put(kerberos.principal, SecurityUtil.getServerPrincipal( principalInConf, hostName)); } String httpKeytab = conf.get(keytabConfKey); if (httpKeytab != null !httpKeytab.isEmpty()) { params.put(kerberos.keytab, httpKeytab); } params.put(AuthenticationFilter.AUTH_TYPE, kerberos); defineFilter(webAppContext, SPNEGO_FILTER, AuthenticationFilter.class.getName(), params, null); } {code} TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837264#comment-13837264 ] Haohui Mai commented on YARN-1463: -- Based on the stack traces, it seems to me that there're two issues here. First, HDFS-5545 introduces a bug so that it is passing null as the configuration key of principals / keytabs into the HttpServer.Builder. The attached patch fixes the problem. Second, Webapps enables spnego authentication when security is enabled but no principals / keytabs are passed in. This configuration is wrong and it should fail. Therefore, in my opinion it is problematic to mask the failures in WebApps.java. Maybe we should fix the unit test instead. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate
[ https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837269#comment-13837269 ] Tsuyoshi OZAWA commented on YARN-1301: -- [~zjshen], a patch is ready now. Could you review it? Thanks. Need to log the blacklist additions/removals when YarnSchedule#allocate --- Key: YARN-1301 URL: https://issues.apache.org/jira/browse/YARN-1301 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Priority: Minor Fix For: 2.4.0 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, YARN-1301.4.patch, YARN-1301.5.patch Now without the log, it's hard to debug whether blacklist is updated on the scheduler side or not -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837276#comment-13837276 ] Jeff Zhang commented on YARN-321: - Will this jira been included in the next release ? Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837278#comment-13837278 ] Binglin Chang commented on YARN-1463: - bq. Webapps enables spnego authentication when security is enabled but no principals / keytabs are passed in. This configuration is wrong and it should fail. I thought the same, but when I looked at the original code: {code} if (spnegoPrincipalKey == null || conf.get(spnegoPrincipalKey, ).isEmpty()) { LOG.warn(Principal for spnego filter is not set); initSpnego = false; } if (spnegoKeytabKey == null || conf.get(spnegoKeytabKey, ).isEmpty()) { LOG.warn(Keytab for spnego filter is not set); initSpnego = false; } {code} The code make a WARN log instead of ERROR, it looks like a intentional behavior, so I keep the original behavior just for safe, thoughts? TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837281#comment-13837281 ] Jeff Zhang commented on YARN-321: - Another question about this jira. I found that the container logURL is hard-coded there, user still could not see the logs of each container ( stdout, stderror ). Is it on the roadmap that allow user to see the logs ? And which jira is tracking this ? Thanks . Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
Junping Du created YARN-1468: Summary: TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed. Key: YARN-1468 URL: https://issues.apache.org/jira/browse/YARN-1468 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Priority: Critical Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837303#comment-13837303 ] Haohui Mai commented on YARN-1463: -- This is fine with me, but the test is broken then. Maybe we can leave a comment there and fix it later on. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837310#comment-13837310 ] Binglin Chang commented on YARN-1463: - bq. but the test is broken then I am sorry. What do you mean? Which test? With my original patch, I didn't see any test fail? TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
[ https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837312#comment-13837312 ] Tsuyoshi OZAWA commented on YARN-1468: -- Maybe this is a timing bug: I cannot reproduce the problem in my local environment. TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed. Key: YARN-1468 URL: https://issues.apache.org/jira/browse/YARN-1468 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Priority: Critical Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
[ https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1468: - Description: Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) {code} Another log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 2.077 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMDelegationTokenRestoredOnRMRestart(TestRMRestart.java:1259) {code}] was: Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) {code} TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed. Key: YARN-1468 URL: https://issues.apache.org/jira/browse/YARN-1468 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Priority: Critical Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) {code} Another log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 2.077 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at
[jira] [Updated] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
[ https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1468: - Description: Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) {code} Another log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 2.077 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMDelegationTokenRestoredOnRMRestart(TestRMRestart.java:1259) {code} was: Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) {code} Another log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 143.009 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMDelegationTokenRestoredOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 2.077 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMDelegationTokenRestoredOnRMRestart(TestRMRestart.java:1259) {code}] TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed. Key: YARN-1468 URL: https://issues.apache.org/jira/browse/YARN-1468 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Priority: Critical Log is as following: {code} Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 44.197 sec FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) at
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837320#comment-13837320 ] Haohui Mai commented on YARN-1463: -- I looked into TestContainerManagerSecurity. I'm not familiar with the code, but it seems to me that the code is testing the secure set up. The unit test does not pass any principals / keytabs in the configuration, therefore spnego will always be disabled. I'm not an expert of the YARN code, but it seems to me that you won't be able to get the right token when spnego is disabled. Maybe someone more familiar with the code can comment on this. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT
[ https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837333#comment-13837333 ] Hadoop QA commented on YARN-126: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580129/YARN-126.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2580//console This message is automatically generated. yarn rmadmin help message contains reference to hadoop cli and JT - Key: YARN-126 URL: https://issues.apache.org/jira/browse/YARN-126 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Rémy SAISSY Labels: usability Attachments: YARN-126.patch has option to specify a job tracker and the last line for general command line syntax had bin/hadoop command [genericOptions] [commandOptions] ran yarn rmadmin to get usage: RMAdmin Usage: java RMAdmin [-refreshQueues] [-refreshNodes] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshAdminAcls] [-refreshServiceAcl] [-help [cmd]] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-862) ResourceManager and NodeManager versions should match on node registration or error out
[ https://issues.apache.org/jira/browse/YARN-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837330#comment-13837330 ] Hadoop QA commented on YARN-862: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589256/YARN-862-b0.23-v2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2579//console This message is automatically generated. ResourceManager and NodeManager versions should match on node registration or error out --- Key: YARN-862 URL: https://issues.apache.org/jira/browse/YARN-862 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 0.23.8 Reporter: Robert Parker Assignee: Robert Parker Attachments: YARN-862-b0.23-v1.patch, YARN-862-b0.23-v2.patch For branch-0.23 the versions of the node manager and the resource manager should match to complete a successful registration. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837337#comment-13837337 ] Karthik Kambatla commented on YARN-1028: Verified on a cluster, writing a unit test for the same. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1028: --- Attachment: yarn-1028-2.patch Here is a new patch that introduces a pluggable failover model and fixes the retry mechanism. High level details: # YarnFailoverProxyProvider implements YARN specific failover-proxy-provider from Clients/ AMs/ NMs to connect to the RM. # ConfiguredFailoverProxyProvider extends the pluggable failover-proxy to toggle between RMs # Required changes to RMProxy, ClientRMProxy and ServerRMProxy. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate
[ https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837352#comment-13837352 ] Tsuyoshi OZAWA commented on YARN-1301: -- Do you mean we should record the number of additions/removals? Need to log the blacklist additions/removals when YarnSchedule#allocate --- Key: YARN-1301 URL: https://issues.apache.org/jira/browse/YARN-1301 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Priority: Minor Fix For: 2.4.0 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, YARN-1301.4.patch, YARN-1301.5.patch Now without the log, it's hard to debug whether blacklist is updated on the scheduler side or not -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837360#comment-13837360 ] Haohui Mai commented on YARN-1463: -- Just to clarify, I think we can fix the unit test in a separate jira. However, it might be worthwhile to add some comments to explain the situations in the unit test. +1. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837359#comment-13837359 ] Hadoop QA commented on YARN-1028: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616692/yarn-1028-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1553 javac compiler warnings (more than the trunk's current 1543 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2581//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2581//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2581//console This message is automatically generated. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1#6144)