[jira] [Updated] (YUNIKORN-587) Allocated resources on a node could become negative

2021-03-19 Thread Weiwei Yang (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YUNIKORN-587: - Parent: YUNIKORN-553 Issue Type: Sub-task (was: Bug) > Allocated resources on a node

[jira] [Commented] (YUNIKORN-587) Allocated resources on a node could become negative

2021-03-19 Thread Weiwei Yang (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305306#comment-17305306 ] Weiwei Yang commented on YUNIKORN-587: -- Increase the severity to blocker. We need to look into

[jira] [Updated] (YUNIKORN-587) Allocated resources on a node could become negative

2021-03-19 Thread Weiwei Yang (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YUNIKORN-587: - Priority: Critical (was: Major) > Allocated resources on a node could become negative >

[jira] [Updated] (YUNIKORN-588) Placeholder pods are not cleaned up timely when the Spark driver fails

2021-03-19 Thread Weiwei Yang (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YUNIKORN-588: - Parent: YUNIKORN-553 Issue Type: Sub-task (was: Bug) > Placeholder pods are not

[jira] [Commented] (YUNIKORN-588) Placeholder pods are not cleaned up timely when the Spark driver fails

2021-03-19 Thread Weiwei Yang (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305305#comment-17305305 ] Weiwei Yang commented on YUNIKORN-588: -- hi [~yuchaoran2011] have you tried to delete the spark

[jira] [Updated] (YUNIKORN-588) Placeholder pods are not cleaned up timely when the Spark driver fails

2021-03-19 Thread Chaoran Yu (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoran Yu updated YUNIKORN-588: Description: When a Spark job is gang scheduled, if the driver pod fails immediately upon

[jira] [Created] (YUNIKORN-588) Placeholder pods are not cleaned up timely when the Spark driver fails

2021-03-19 Thread Chaoran Yu (Jira)
Chaoran Yu created YUNIKORN-588: --- Summary: Placeholder pods are not cleaned up timely when the Spark driver fails Key: YUNIKORN-588 URL: https://issues.apache.org/jira/browse/YUNIKORN-588 Project:

[jira] [Created] (YUNIKORN-587) Allocated resources on a node could become negative

2021-03-19 Thread Chaoran Yu (Jira)
Chaoran Yu created YUNIKORN-587: --- Summary: Allocated resources on a node could become negative Key: YUNIKORN-587 URL: https://issues.apache.org/jira/browse/YUNIKORN-587 Project: Apache YuniKorn

[jira] [Commented] (YUNIKORN-584) App recovery is skipped when applicationID is not set in pods' label

2021-03-19 Thread Chaoran Yu (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305246#comment-17305246 ] Chaoran Yu commented on YUNIKORN-584: - Thanks to Weiwei's help, we were able to locate the root

[jira] [Updated] (YUNIKORN-584) App recovery is skipped when applicationID is not set in pods' label

2021-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-584: Labels: pull-request-available (was: ) > App recovery is skipped when applicationID is

[jira] [Updated] (YUNIKORN-584) App recovery is skipped when applicationID is not set in pods' label

2021-03-19 Thread Weiwei Yang (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YUNIKORN-584: - Summary: App recovery is skipped when applicationID is not set in pods' label (was: The node

[jira] [Updated] (YUNIKORN-586) Enhance placeholder cleanup on timeout

2021-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-586: Labels: pull-request-available (was: ) > Enhance placeholder cleanup on timeout >

[jira] [Commented] (YUNIKORN-556) Expose pod level events when an app is failed in scheduling

2021-03-19 Thread Kinga Marton (Jira)
[ https://issues.apache.org/jira/browse/YUNIKORN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304861#comment-17304861 ] Kinga Marton commented on YUNIKORN-556: --- Thank you [~avsamit6600] for the help. [~tingyao]