[ 
https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-8232:
----------------------------
          Flags: Patch
     Attachment: YARN_8232.patch
    Description: 
RMContainer has a member variable queuename to store which queue the container 
belongs to. When RM HA happens and RMContainers are recovered by scheduler 
based on NM reports, the queue name isn't recovered and always be null.

This situation causes many problem. Here is a case in preemption. Preemption 
uses container's queue name to deduct preemptable resources when we use more 
than one preempt selector, (for example, enable intra-queue preemption,) . The 
detail is in
{code:java}
CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
If the contain's queue name is null, this function will throw a 
YarnRuntimeException because it tries to get the container's 
TempQueuePerPartition and the preemption fails.

Our patch solved this problem by setting container queue name when recover 
containers. The patch is based on branch-2.8.3.

 

 

  was:
RMContainer has a member variable queuename to store which queue the container 
belongs to. Preemption uses this information to deduct preemptable resources.

When RM HA happens and RMContainers are recovered by scheduler based on NM 
reports, we didn't set queue name information to RMContainers. At this 
situation, when we use more than one preempt selector, (for example, enable 
intra-queue preemption,) the 
{code:java}
CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
will throw a YarnRuntimeException because it tries to get the container's 
TempQueuePerPartition where container's queue name is null.

 

Our patch solved this problem by setting container queue name when recover 
containers. The patch is based on branch-2.8.3.

 

 

        Summary: RMContainer lost queue name when RM HA happens  (was: 
RMContainer lost Queue name when recovered by RM)

> RMContainer lost queue name when RM HA happens
> ----------------------------------------------
>
>                 Key: YARN-8232
>                 URL: https://issues.apache.org/jira/browse/YARN-8232
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.8.3
>            Reporter: Hu Ziqian
>            Priority: Major
>         Attachments: YARN_8232.patch
>
>
> RMContainer has a member variable queuename to store which queue the 
> container belongs to. When RM HA happens and RMContainers are recovered by 
> scheduler based on NM reports, the queue name isn't recovered and always be 
> null.
> This situation causes many problem. Here is a case in preemption. Preemption 
> uses container's queue name to deduct preemptable resources when we use more 
> than one preempt selector, (for example, enable intra-queue preemption,) . 
> The detail is in
> {code:java}
> CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
> If the contain's queue name is null, this function will throw a 
> YarnRuntimeException because it tries to get the container's 
> TempQueuePerPartition and the preemption fails.
> Our patch solved this problem by setting container queue name when recover 
> containers. The patch is based on branch-2.8.3.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to