[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706348#comment-13706348
 ] 

Djellel Eddine Difallah commented on YARN-897:
----------------------------------------------

Omkar, thanks for the feedback
{quote}any reason for this even after this patch? if we don't see any other 
issues then why not just use childQueues.remove instead of iterating?{quote}
The tree is already out of order because of the new usedCapacity, the remove() 
won't work. We have to iterate and add() to fix the order.
{quote}reinsertQueue could be marked synchronized? thoughts? But yeah.. without 
that too it is thread safe as we are locking it at 
CapacitySchedulder.nodeUpdate(). but still it is better to mark it.{quote}
ok, sounds reasonable to put a synchronize there.
{quote}LOG.info("Re-sorting queues since queue got completed: " + 
childQueue.getQueuePath() +
nit. line > 80{quote}
sure
{quote}at present we send the container completed event to leaf queue and then 
keep propagating it till root. why not sent the event to root grab the locks 
from root->leaf and update it? any thoughts?{quote}
Because the released container is linked to a leaf queue and we have to walk 
bottom up to figure out to which parent propagate. The assignment phase, 
however, works the way you described.
                
> CapacityScheduler wrongly sorted queues
> ---------------------------------------
>
>                 Key: YARN-897
>                 URL: https://issues.apache.org/jira/browse/YARN-897
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Djellel Eddine Difallah
>         Attachments: TestBugParentQueue.java, YARN-897-1.patch
>
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to