[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460.patch trunk patch. After thinking about this more and talking to Arun a bit about it. I have made this patch just contain the check for stopped and it returns an EMPTY_ALLOCATION similar to what it did before if the application was null when calling allocate(). I will file a follow up jira to investigate if the AMResponse should have another field so that the RM could send the AM useful error information other then just reboot. > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Blocker > Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, > YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, > YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460-branch-0.23.patch branch-23 patch with just the stopped check. > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Blocker > Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, > YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Priority: Blocker (was: Critical) > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Blocker > Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, > YARN-460.patch, YARN-460.patch, YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460-branch-0.23.patch updated branch-0.23 patch having Allocation return an error and then ApplicationMasterService send reboot command to AM. > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, > YARN-460.patch, YARN-460.patch, YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460.patch trunk and branch-2 patch. Unfortunately I couldn't easily come up with a unit test to hit the application stopped condition (without hitting the null check) due to the data structures being private. > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, > YARN-460.patch, YARN-460.patch, YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460-branch-0.23.patch > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > Attachments: YARN-460-branch-0.23.patch, YARN-460.patch, > YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460.patch updated patch for trunk/branch-2. > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > Attachments: YARN-460-branch-0.23.patch, YARN-460.patch, > YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460.patch So I think we can simply track if the application gets stopped and then check that in the allocate() call before really processing it. All the stopping/removing of the application happens in CS.doneApplication and the race is really between the calls in that function and the fact that allocate() isn't synchronized. No other paths I could find should cause issues since most of the other funtions in CS are all synchronized and wouldn't run while the doneApplication is happening. here is a preliminary patch that I am going to do some more testing on it. The checks for stopped in the SchedulerApp are extra I was just being paranoid. > CS user left in list of active users for the queue even when application > finished > - > > Key: YARN-460 > URL: https://issues.apache.org/jira/browse/YARN-460 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > Attachments: YARN-460.patch > > > We have seen a user get left in the queues list of active users even though > the application was removed. This can cause everyone else in the queue to get > less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira