[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-3.patch A new version of the patch with completedContainer taking the CSQueue to reinsert as suggested by [~acmurthy]. Also the patch now contains the unit test for testing proper childQueues sort order. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.4-alpha Reporter: Djellel Eddine Difallah Priority: Blocker Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch, YARN-897-3.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-4.patch [~acmurthy] Unfortunately, as I pointed above with Omkar, we have to iterate because at that point in time the chilQueues are already out of order and we can't use TreeSet methods. For the same reason assignContainersToChildQueues iterate then add/remove too. This patch moves the code of reinsertQueue inline with completedContainer. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.4-alpha Reporter: Djellel Eddine Difallah Priority: Blocker Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch, YARN-897-3.patch, YARN-897-4.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-1.patch Attached is a first patch attempt to address the bug: Upon container completion, which triggers completedContainer(), remove and reinsert the queue into its parent's childQueues. This operation is done recursively starting from the leafQueue where the container got released. Thus, by handling both cases where usedCapacity is ever changed (assignement and completion) the TreeSet remains properly sorted. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706348#comment-13706348 ] Djellel Eddine Difallah commented on YARN-897: -- Omkar, thanks for the feedback {quote}any reason for this even after this patch? if we don't see any other issues then why not just use childQueues.remove instead of iterating?{quote} The tree is already out of order because of the new usedCapacity, the remove() won't work. We have to iterate and add() to fix the order. {quote}reinsertQueue could be marked synchronized? thoughts? But yeah.. without that too it is thread safe as we are locking it at CapacitySchedulder.nodeUpdate(). but still it is better to mark it.{quote} ok, sounds reasonable to put a synchronize there. {quote}LOG.info(Re-sorting queues since queue got completed: + childQueue.getQueuePath() + nit. line 80{quote} sure {quote}at present we send the container completed event to leaf queue and then keep propagating it till root. why not sent the event to root grab the locks from root-leaf and update it? any thoughts?{quote} Because the released container is linked to a leaf queue and we have to walk bottom up to figure out to which parent propagate. The assignment phase, however, works the way you described. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-2.patch Patch reflecting Omkar's comments. 1) add synchronized to reinsertQueue 2) reduce line length CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-1.patch CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-897) CapacityScheduler wrongly sorted queues
Djellel Eddine Difallah created YARN-897: Summary: CapacityScheduler wrongly sorted queues Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: TestBugParentQueue.java Simple JUnit Test that triggers the bug. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699373#comment-13699373 ] Djellel Eddine Difallah commented on YARN-897: -- We spotted this bug while experimenting on dynamic queues updates. The TreeSet methods .contains() and .remove() failed on retrieving a queue that we knew was there, and that gave us a hint that the tree was unsorted properly. The attached test is a [simple junit test | https://issues.apache.org/jira/secure/attachment/12590676/TestBugParentQueue.java] inspired by the already available capacity scheduler tests. It does simulate the scenario that [~curino] describes above and displays the order in which the childQueues is left after a couple of container assignments and completions. I will post a first version of a patch that re-inserts the recently completed container's queue (and all its parents) into their respective parents' childQueues. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira