[ 
https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-8232:
----------------------------
    Description: 
RMContainer has a member variable queuename to store which queue the container 
belongs to. Preemption uses this information to deduct preemptable resources.

When RM HA happens and RMContainers are recovered by scheduler based on NM 
reports, we didn't set queue name information to RMContainers. At this 
situation, when we use more than one preempt selector, (for example, enable 
intra-queue preemption,) the 
{code:java}
CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
will throw a YarnRuntimeException because it tries to get the container's 
TempQueuePerPartition where container's queue name is null.

 

Our patch solved this problem by setting container queue name when recover 
containers. The patch is based on branch-2.8.3.

 

 

> RMContainer lost Queue name when recovered by RM
> ------------------------------------------------
>
>                 Key: YARN-8232
>                 URL: https://issues.apache.org/jira/browse/YARN-8232
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.8.3
>            Reporter: Hu Ziqian
>            Priority: Major
>
> RMContainer has a member variable queuename to store which queue the 
> container belongs to. Preemption uses this information to deduct preemptable 
> resources.
> When RM HA happens and RMContainers are recovered by scheduler based on NM 
> reports, we didn't set queue name information to RMContainers. At this 
> situation, when we use more than one preempt selector, (for example, enable 
> intra-queue preemption,) the 
> {code:java}
> CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
> will throw a YarnRuntimeException because it tries to get the container's 
> TempQueuePerPartition where container's queue name is null.
>  
> Our patch solved this problem by setting container queue name when recover 
> containers. The patch is based on branch-2.8.3.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to