[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880797#comment-13880797 ] Jian He commented on YARN-1618: --- bq. is it still the case that RPC servers are started after recovery is complete? it is. bq. The START should come almost immediately after the RMAppImpl object is created in a NEW state during regular app submission. Karthik, are we sure that this happened? yes, it is. bq. There is no need for history for an app that was never submitted successfully to the RM. I agree. We don't need to save the final state of the app if the app is not even accepted by the RM. bq. If we don't want the store to be touched until the app is SUBMITTED/ ACCEPTED (X), we should probably replace the existing NEW_SAVING state with a corresponding X_SAVING state, and re-jig the transitions to directly go to KILLED/FAILED from any of the states before this X_SAVING state. Regarding the two approaches Karthik proposed. I'm in favor of the 1st one. Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store --- Key: YARN-1618 URL: https://issues.apache.org/jira/browse/YARN-1618 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1618-1.patch YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing. Previous description: ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880813#comment-13880813 ] Bikas Saha commented on YARN-1618: -- All we need to do is go from NEW-KILLED on KILL event and ignore START event in KILLED state. Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store --- Key: YARN-1618 URL: https://issues.apache.org/jira/browse/YARN-1618 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1618-1.patch YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing. Previous description: ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-1480: -- Attachment: YARN-1480-3.patch Added a test for invalid finalStatus refer to appStates test case. Thanks for your confirmation and suggestion. RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1631) Container allocation issue in Leafqueue assignContainers()
Sunil G created YARN-1631: - Summary: Container allocation issue in Leafqueue assignContainers() Key: YARN-1631 URL: https://issues.apache.org/jira/browse/YARN-1631 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: SuSe 11 Linux Reporter: Sunil G Application1 has a demand of 8GB[Map Task Size as 8GB] which is more than Node_1 can handle. Node_1 has a size of 8GB and 2GB is used by Application1's AM. Hence reservation happened for remaining 6GB in Node_1 by Application1. A new job is submitted with 2GB AM size and 2GB task size with only 2 Maps to run. Node_2 also has 8GB capability. But Application2's AM cannot be launched in Node_2. And Application2 waits longer as only 2 Nodes are available in cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1631) Container allocation issue in Leafqueue assignContainers()
[ https://issues.apache.org/jira/browse/YARN-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1631: -- Attachment: Yarn-1631.1.patch As per Leafqueue assignContainers(), application will be fetched from activeApplications. In this scenario, Application1 is fetched first when Node Update of Node_2 came. So below check in assignContainers failed // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { return NULL_ASSIGNMENT; } Here the queue limit was crossing the limit. But as per the return statement, the loop never tried for the second application. Application2 has only 2GB demand to launch. And this could have launched in Node_2. So instead of return NULL_ASSIGNMENT, it is better to break from the inner loop. some user limit check also breaking from inner loop. Kindly check this patch and please share your thoughts. Container allocation issue in Leafqueue assignContainers() -- Key: YARN-1631 URL: https://issues.apache.org/jira/browse/YARN-1631 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: SuSe 11 Linux Reporter: Sunil G Attachments: Yarn-1631.1.patch Application1 has a demand of 8GB[Map Task Size as 8GB] which is more than Node_1 can handle. Node_1 has a size of 8GB and 2GB is used by Application1's AM. Hence reservation happened for remaining 6GB in Node_1 by Application1. A new job is submitted with 2GB AM size and 2GB task size with only 2 Maps to run. Node_2 also has 8GB capability. But Application2's AM cannot be launched in Node_2. And Application2 waits longer as only 2 Nodes are available in cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1575) Public localizer crashes with Localized unkown resource
[ https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881134#comment-13881134 ] Kihwal Lee commented on YARN-1575: -- +1 The patch looks good. The use of synchronizedMap and double locking is well justified. Public localizer crashes with Localized unkown resource - Key: YARN-1575 URL: https://issues.apache.org/jira/browse/YARN-1575 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch The public localizer can crash with the error: {noformat} 2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26 2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881180#comment-13881180 ] Jonathan Eagles commented on YARN-1479: --- Thanks, Chen. Couple of minor things and a question for you. * There are a couple of unnecessary imports in TestApplicationMasterService. Let's get those cleaned up before this patch goes in. * progressCheck - the function will be better off package-private since the intention is not to advertise new functionality * progressCheck - this function should be renamed since check is a question and not an indication something is being modified. Perhaps progressFilter or hopefully you can think of something better. Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-1479 URL: https://issues.apache.org/jira/browse/YARN-1479 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Chen He Fix For: 2.4.0 Attachments: Yarn-1479.patch I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881354#comment-13881354 ] Cindy Li commented on YARN-1525: [~xgong], thanks for your comments. In the current code, it just loop through all the available RM ids. There is already an rpcTimeoutForChecks when querying HAState, do we need to set additional MAXIMUM_waiting time here too? @Karthik, thanks for your comments. I can make it into HAUtil. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch.v1, YARN1525.patch.v2 When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1617) Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate
[ https://issues.apache.org/jira/browse/YARN-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881420#comment-13881420 ] Karthik Kambatla commented on YARN-1617: +1 Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate --- Key: YARN-1617 URL: https://issues.apache.org/jira/browse/YARN-1617 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1617.patch {code} synchronized private void allocate(Container container) { // Update consumption and track allocations //TODO: fixme sharad /* try { store.storeContainer(container); } catch (IOException ie) { // TODO fix this. we shouldnt ignore }*/ LOG.debug(allocate: applicationId= + applicationId + container= + container.getId() + host= + container.getNodeId().toString()); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
Chen He created YARN-1632: - Summary: TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package Key: YARN-1632 URL: https://issues.apache.org/jira/browse/YARN-1632 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0, 0.23.9 Reporter: Chen He Assignee: Chen He Priority: Minor ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881499#comment-13881499 ] Chen He commented on YARN-1479: --- Hi [~jeagles] Thank you for your suggestion. I can answer your questions one by one. {quote}There are a couple of unnecessary imports in TestApplicationMasterService. Let's get those cleaned up before this patch goes in.{quote} I have removed those unnecessary imports; {quote}progressCheck - the function will be better off package-private since the intention is not to advertise new functionality{quote} {quote}progressCheck - this function should be renamed since check is a question and not an indication something is being modified. Perhaps progressFilter or hopefully you can think of something better.{quote} If progressCheck is package-private, it can not be directly called in the TestApplicationMasterSerive since Yarn-1632; I will remove progressCheck method in the yarn-1479v2.patch and migrate its code into ApplicationMasterService.allocate() method. Then, we only need to have testAllocate() method in TestApplicationMasterService. Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-1479 URL: https://issues.apache.org/jira/browse/YARN-1479 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Chen He Fix For: 2.4.0 Attachments: Yarn-1479.patch I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai reassigned YARN-1600: Assignee: Haohui Mai RM does not startup when security is enabled without spnego configured -- Key: YARN-1600 URL: https://issues.apache.org/jira/browse/YARN-1600 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Haohui Mai Priority: Blocker We have a custom auth filter in front of our various UI pages that handles user authentication. However currently the RM assumes that if security is enabled then the user must have configured spnego as well for the RM web pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881507#comment-13881507 ] Sandy Ryza commented on YARN-1630: -- Can the config be added as a timeout instead of a number of polls? Also, a couple nits: Timeout should default to -1, meaning forever No need for message var. just LOG.info directly. An (Yarn)exception should be thrown to indicate that the operation didn't complete in time. Otherwise clients might think it had completed succesfuly. Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever --- Key: YARN-1630 URL: https://issues.apache.org/jira/browse/YARN-1630 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Aditya Acharya Assignee: Aditya Acharya Attachments: diff.txt I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was Watiting for application application_1389036507624_0018 to be killed. The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1633) Define the entity, entity-info and event objects
Vinod Kumar Vavilapalli created YARN-1633: - Summary: Define the entity, entity-info and event objects Key: YARN-1633 URL: https://issues.apache.org/jira/browse/YARN-1633 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Define the core objects of the application-timeline effort. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1633) Define the entity, entity-info and event objects
[ https://issues.apache.org/jira/browse/YARN-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1633: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1530 Define the entity, entity-info and event objects Key: YARN-1633 URL: https://issues.apache.org/jira/browse/YARN-1633 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Define the core objects of the application-timeline effort. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1634) Define a ApplicationTimelineStore interface and an in-memory implementation
Vinod Kumar Vavilapalli created YARN-1634: - Summary: Define a ApplicationTimelineStore interface and an in-memory implementation Key: YARN-1634 URL: https://issues.apache.org/jira/browse/YARN-1634 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli As per the design doc, the store needs to pluggable. We need a base interface, and an in-memory implementation for testing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore
Vinod Kumar Vavilapalli created YARN-1635: - Summary: Implement a Leveldb based ApplicationTimelineStore Key: YARN-1635 URL: https://issues.apache.org/jira/browse/YARN-1635 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli As per the design doc, we need a levelDB + local-filesystem based implementation to start with and for small deployments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1636) Implement timeline related web-services inside AHS for storing and retrieving entities+eventies
Vinod Kumar Vavilapalli created YARN-1636: - Summary: Implement timeline related web-services inside AHS for storing and retrieving entities+eventies Key: YARN-1636 URL: https://issues.apache.org/jira/browse/YARN-1636 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore
[ https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-1635: - Assignee: Vinod Kumar Vavilapalli Implement a Leveldb based ApplicationTimelineStore -- Key: YARN-1635 URL: https://issues.apache.org/jira/browse/YARN-1635 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli As per the design doc, we need a levelDB + local-filesystem based implementation to start with and for small deployments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1637) Implement a client library for java users to post entities+events
Vinod Kumar Vavilapalli created YARN-1637: - Summary: Implement a client library for java users to post entities+events Key: YARN-1637 URL: https://issues.apache.org/jira/browse/YARN-1637 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli This is a wrapper around the web-service to facilitate easy posting of entity+event data to the time-line server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1638) Add an integration test validating post, storage and retrival of entites+events
Vinod Kumar Vavilapalli created YARN-1638: - Summary: Add an integration test validating post, storage and retrival of entites+events Key: YARN-1638 URL: https://issues.apache.org/jira/browse/YARN-1638 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated YARN-1600: - Attachment: YARN-1600.000.patch This patch ports the solution in the earlier patches of YARN-1463. RM does not startup when security is enabled without spnego configured -- Key: YARN-1600 URL: https://issues.apache.org/jira/browse/YARN-1600 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Haohui Mai Priority: Blocker Attachments: YARN-1600.000.patch We have a custom auth filter in front of our various UI pages that handles user authentication. However currently the RM assumes that if security is enabled then the user must have configured spnego as well for the RM web pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
[ https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1632: -- Attachment: yarn-1632.patch TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package --- Key: YARN-1632 URL: https://issues.apache.org/jira/browse/YARN-1632 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9, 2.2.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: yarn-1632.patch ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1525: --- Attachment: YARN1525.patch.v3 I've added a test case file TestRMWebHA.java, a HA util file named RMHAUtils.java. It is easier to add a new HA util file specifically for RM than to add the getActiveRMId functions to HAUtils.java, which is part of yarn conf package. Talked with Xuan and Vinod offline, we don't need to add MAXIMUM_waiting_time here, as the scan for active RM only happens once. If no active RM is found, it returns null. I've also minimized the format related change in the new patch. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3 When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Acharya updated YARN-1630: - Attachment: (was: diff.txt) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever --- Key: YARN-1630 URL: https://issues.apache.org/jira/browse/YARN-1630 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Aditya Acharya Assignee: Aditya Acharya Attachments: diff.txt I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was Watiting for application application_1389036507624_0018 to be killed. The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Acharya updated YARN-1630: - Attachment: diff.txt Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever --- Key: YARN-1630 URL: https://issues.apache.org/jira/browse/YARN-1630 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Aditya Acharya Assignee: Aditya Acharya Attachments: diff.txt I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was Watiting for application application_1389036507624_0018 to be killed. The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1639) YARM RM HA requires different configs on different RM hosts
Arpit Gupta created YARN-1639: - Summary: YARM RM HA requires different configs on different RM hosts Key: YARN-1639 URL: https://issues.apache.org/jira/browse/YARN-1639 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Arpit Gupta Assignee: Xuan Gong We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you want to first or second. This means we have different configs on different RM nodes. This is unlike HDFS HA where the same configs are pushed to both NN's and it would be better to have the same setup for RM as this would make installation and managing easier. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881568#comment-13881568 ] Sandy Ryza commented on YARN-1630: -- Thanks Aditya. To make the naming less ambiguous, we should call the property client.application-client-protocol.poll-timeout-ms. After that, LGTM. Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever --- Key: YARN-1630 URL: https://issues.apache.org/jira/browse/YARN-1630 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Aditya Acharya Assignee: Aditya Acharya Attachments: diff.txt I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was Watiting for application application_1389036507624_0018 to be killed. The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts
[ https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881576#comment-13881576 ] Karthik Kambatla commented on YARN-1639: Agree being able to use the same configs for both RMs would be simpler to deploy. While working on YARN-1232, I discussed this with Bikas and Alejandro. I can't remember the reason, but I think there was a reason we decided to go with different ha.ids/configs. [~bikassaha], [~tucu00] - do you remember why? YARM RM HA requires different configs on different RM hosts --- Key: YARN-1639 URL: https://issues.apache.org/jira/browse/YARN-1639 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Arpit Gupta Assignee: Xuan Gong We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you want to first or second. This means we have different configs on different RM nodes. This is unlike HDFS HA where the same configs are pushed to both NN's and it would be better to have the same setup for RM as this would make installation and managing easier. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1639) YARM RM HA requires different configs on different RM hosts
[ https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1639: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-149 YARM RM HA requires different configs on different RM hosts --- Key: YARN-1639 URL: https://issues.apache.org/jira/browse/YARN-1639 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Arpit Gupta Assignee: Xuan Gong We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you want to first or second. This means we have different configs on different RM nodes. This is unlike HDFS HA where the same configs are pushed to both NN's and it would be better to have the same setup for RM as this would make installation and managing easier. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts
[ https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881577#comment-13881577 ] Karthik Kambatla commented on YARN-1639: In any case, if we decide to go ahead with simplifying this, can we do it such that explicitly specifying the ha.id also works. YARM RM HA requires different configs on different RM hosts --- Key: YARN-1639 URL: https://issues.apache.org/jira/browse/YARN-1639 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Arpit Gupta Assignee: Xuan Gong We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you want to first or second. This means we have different configs on different RM nodes. This is unlike HDFS HA where the same configs are pushed to both NN's and it would be better to have the same setup for RM as this would make installation and managing easier. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1640) Manual Failover does not work in secure clusters
Xuan Gong created YARN-1640: --- Summary: Manual Failover does not work in secure clusters Key: YARN-1640 URL: https://issues.apache.org/jira/browse/YARN-1640 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1640) Manual Failover does not work in secure clusters
[ https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881592#comment-13881592 ] Karthik Kambatla commented on YARN-1640: [~xgong] - can you confirm YARN-1598 is part of what you are testing. Manual Failover does not work in secure clusters Key: YARN-1640 URL: https://issues.apache.org/jira/browse/YARN-1640 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong NodeManager gets rejected after manually making one RM as active. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1640) Manual Failover does not work in secure clusters
[ https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1640: Description: NodeManager gets rejected after manually making one RM as active. Manual Failover does not work in secure clusters Key: YARN-1640 URL: https://issues.apache.org/jira/browse/YARN-1640 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong NodeManager gets rejected after manually making one RM as active. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1640) Manual Failover does not work in secure clusters
[ https://issues.apache.org/jira/browse/YARN-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881594#comment-13881594 ] Xuan Gong commented on YARN-1640: - I think so, I am using the latest trunk code.. Manual Failover does not work in secure clusters Key: YARN-1640 URL: https://issues.apache.org/jira/browse/YARN-1640 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong NodeManager gets rejected after manually making one RM as active. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1641) ZK store - periodically create a dummy file to kick in fencing
Karthik Kambatla created YARN-1641: -- Summary: ZK store - periodically create a dummy file to kick in fencing Key: YARN-1641 URL: https://issues.apache.org/jira/browse/YARN-1641 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fencing in ZK store kicks in when the RM tries to write something to the store. If the RM doesn't write anything to the store, it doesn't get fenced and can continue to assume being the Active. By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can ensure it gets fenced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881600#comment-13881600 ] Cindy Li commented on YARN-1525: I'll work on a new ticket for the web service redirection part. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3 When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
[ https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881672#comment-13881672 ] Hadoop QA commented on YARN-1632: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625145/yarn-1632.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2931//console This message is automatically generated. TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package --- Key: YARN-1632 URL: https://issues.apache.org/jira/browse/YARN-1632 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9, 2.2.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: yarn-1632.patch ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1573) ZK store should use a private password for root-node-acls
[ https://issues.apache.org/jira/browse/YARN-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881687#comment-13881687 ] Hudson commented on YARN-1573: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5036/]) YARN-1573. ZK store should use a private password for root-node-acls. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1560594) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java ZK store should use a private password for root-node-acls - Key: YARN-1573 URL: https://issues.apache.org/jira/browse/YARN-1573 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fix For: 2.4.0 Attachments: yarn-1573-1.patch, yarn-1573-2.patch Currently, when HA is enabled, ZK store uses cluster-timestamp as the password for root node ACLs to give the Active RM exclusive access to the store. A more private value like a random number might be better. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1575) Public localizer crashes with Localized unkown resource
[ https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881685#comment-13881685 ] Hudson commented on YARN-1575: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5036/]) YARN-1575. Public localizer crashes with Localized unkown resource. Contributed by Jason Lowe (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561110) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java Public localizer crashes with Localized unkown resource - Key: YARN-1575 URL: https://issues.apache.org/jira/browse/YARN-1575 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Fix For: 3.0.0, 2.4.0, 0.23.11 Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch The public localizer can crash with the error: {noformat} 2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26 2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1628) TestContainerManagerSecurity fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881699#comment-13881699 ] Hadoop QA commented on YARN-1628: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624860/YARN-1628.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2932//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2932//console This message is automatically generated. TestContainerManagerSecurity fails on trunk --- Key: YARN-1628 URL: https://issues.apache.org/jira/browse/YARN-1628 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1628.patch The Test fails with the following error {noformat} java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881714#comment-13881714 ] Hadoop QA commented on YARN-1630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625151/diff.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2934//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2934//console This message is automatically generated. Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever --- Key: YARN-1630 URL: https://issues.apache.org/jira/browse/YARN-1630 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Aditya Acharya Assignee: Aditya Acharya Attachments: diff.txt I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was Watiting for application application_1389036507624_0018 to be killed. The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
[ https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881715#comment-13881715 ] Hadoop QA commented on YARN-1629: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624937/YARN-1629-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2933//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2933//console This message is automatically generated. IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer -- Key: YARN-1629 URL: https://issues.apache.org/jira/browse/YARN-1629 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881719#comment-13881719 ] Hadoop QA commented on YARN-1480: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625013/YARN-1480-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2935//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2935//console This message is automatically generated. RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.1.5#6160)