[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880029#comment-13880029 ] Akira AJISAKA commented on YARN-1480: - Thanks for renewing the patch! I built a document and it looks good. I also confirmed the CLI behavior. Would you add a test case for invalid finalStatus as that for invalid appStates? RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1628) TestContainerManagerSecurity fails on trunk
Mit Desai created YARN-1628: --- Summary: TestContainerManagerSecurity fails on trunk Key: YARN-1628 URL: https://issues.apache.org/jira/browse/YARN-1628 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0, 3.0.0 Reporter: Mit Desai Assignee: Mit Desai The Test fails with the following error {noformat} java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1628) TestContainerManagerSecurity fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1628: Attachment: YARN-1628.patch Attaching the patch. The argument InvalidHost was not a valid argument which resulted into throwing UnknownHostException. TestContainerManagerSecurity fails on trunk --- Key: YARN-1628 URL: https://issues.apache.org/jira/browse/YARN-1628 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1628.patch The Test fails with the following error {noformat} java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
Sandy Ryza created YARN-1629: Summary: IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer Key: YARN-1629 URL: https://issues.apache.org/jira/browse/YARN-1629 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Acharya updated YARN-1630: - Attachment: diff.txt Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever --- Key: YARN-1630 URL: https://issues.apache.org/jira/browse/YARN-1630 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Aditya Acharya Assignee: Aditya Acharya Attachments: diff.txt I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was Watiting for application application_1389036507624_0018 to be killed. The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
Aditya Acharya created YARN-1630: Summary: Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever Key: YARN-1630 URL: https://issues.apache.org/jira/browse/YARN-1630 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Aditya Acharya Assignee: Aditya Acharya Attachments: diff.txt I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was Watiting for application application_1389036507624_0018 to be killed. The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
[ https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1629: - Attachment: YARN-1629.patch IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer -- Key: YARN-1629 URL: https://issues.apache.org/jira/browse/YARN-1629 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1629.patch This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
[ https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1629: - Attachment: YARN-1629-1.patch IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer -- Key: YARN-1629 URL: https://issues.apache.org/jira/browse/YARN-1629 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1629-1.patch, YARN-1629.patch This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
[ https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1629: - Attachment: YARN-1629-2.patch IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer -- Key: YARN-1629 URL: https://issues.apache.org/jira/browse/YARN-1629 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
[ https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880466#comment-13880466 ] Alejandro Abdelnur commented on YARN-1629: -- +1 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer -- Key: YARN-1629 URL: https://issues.apache.org/jira/browse/YARN-1629 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880725#comment-13880725 ] Vinod Kumar Vavilapalli commented on YARN-1618: --- Sorry Jian is away and I'm tied up with other things. The point about saving app before scheduler acknowledges is a known issue. If that is the only issue, we can close as a duplicate of YARN-1507 which already exists. I'll need to look carefully but this is why I think we did this. Because the save happens asynchronously, an app may get killed while the app is persisted but not acknowledged. That's the reason why it moves to FINAL_SAVING state first. Obviously, that also means that there are cases like this when the original save request hasn't reached the state-store yet. Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store --- Key: YARN-1618 URL: https://issues.apache.org/jira/browse/YARN-1618 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1618-1.patch YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing. Previous description: ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880745#comment-13880745 ] Bikas Saha commented on YARN-1618: -- App goes from NEW-NEW_SAVING upon receiving START. It goes from NEW-SAVING-SUBMITTED after app is saved. It goes from NEW_SAVING-FINAL_SAVING if killed while saving. All of these work fine. Given the above transitions, app should go from NEW-KILLED if its killed before receiving the START event. START event should be ignored in KILLED state (currently it is not ignored). So if START comes after KILL then its a no-op. If START comes before KILL then state store is fine since the app will first be saved and then updated. Its interesting that we caught the race such that KILL came before START. The START should come almost immediately after the RMAppImpl object is created in a NEW state during regular app submission. Karthik, are we sure that this happened? This should not happen during recovery time since the RMAppImpl moves from NEW-NEXT_STATE after receiving the RECOVER event. RPC servers should not be running during recovery. Vinod, is it still the case that RPC servers are started after recovery is complete? Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store --- Key: YARN-1618 URL: https://issues.apache.org/jira/browse/YARN-1618 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1618-1.patch YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing. Previous description: ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)