[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109972#comment-16109972
 ] 

Wangda Tan commented on YARN-6901:
----------------------------------

bq. I agree it would be nice if getQueuePath were lockless, although long-term 
I think something like YARN-6917 would be preferable to a volatile approach. 
I'm OK with volatile in the short-term.
If there's no existing issue we can find, I agree to delay the fix to 
YARN-6917. 

bq. That does not seem to have anything to do with reaching up the hierarchy.
Inside AbstractContainerAllocator:
{code}
        application
            .getCSLeafQueue()
            .getOrderingPolicy()
{code}
To me grabbing lock of leaf queue while holding lock of app is also not a good 
practice. 

bq. It also looks like it introduces a bug if two threads try to call 
setOrderingPolicy at the same time 
Actually orderingPolicy can be set only when queue do reinitialize (which 
protected by queue's synchronize lock). Do you think does it still have the 
issue in the case?  

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> ----------------------------------------------------------------
>
>                 Key: YARN-6901
>                 URL: https://issues.apache.org/jira/browse/YARN-6901
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> ================
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> org.apache.hadoop.yarn.api.records.Resource, java.util.Map, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=61, line=277 (Compiled 
> frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(java.util.Map,
>  org.apache.hadoop.yarn.api.records.Resource, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=374, line=138 (Compiled 
> frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueue,
>  org.apache.hadoop.yarn.api.records.Resource) @bci=264, line=342 (Compiled 
> frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule()
>  @bci=34, line=202 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy()
>  @bci=4, line=81 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run()
>  @bci=23, line=92 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to