[ 
https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458291#comment-16458291
 ] 

Wangda Tan commented on YARN-8232:
----------------------------------

Thanks [~ziqian hu] for reporting and work on the patch. 

Could you create a patch on top of trunk? That's typically what we do fixes. 

The patch should be named (JIRA_NUMBER.version.patch). You can check 
https://wiki.apache.org/hadoop/HowToContribute for details.

And for the patch, instead of getting application inside the func, you can pass 
queue name from external function ({{recoverContainersOnNode}}), which can 
avoid accessing scheduler application once.

> RMContainer lost queue name when RM HA happens
> ----------------------------------------------
>
>                 Key: YARN-8232
>                 URL: https://issues.apache.org/jira/browse/YARN-8232
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.8.3
>            Reporter: Hu Ziqian
>            Assignee: Hu Ziqian
>            Priority: Major
>         Attachments: YARN_8232.patch
>
>
> RMContainer has a member variable queuename to store which queue the 
> container belongs to. When RM HA happens and RMContainers are recovered by 
> scheduler based on NM reports, the queue name isn't recovered and always be 
> null.
> This situation causes some problems. Here is a case in preemption. Preemption 
> uses container's queue name to deduct preemptable resources when we use more 
> than one preempt selector, (for example, enable intra-queue preemption,) . 
> The detail is in
> {code:java}
> CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
> If the contain's queue name is null, this function will throw a 
> YarnRuntimeException because it tries to get the container's 
> TempQueuePerPartition and the preemption fails.
> Our patch solved this problem by setting container queue name when recover 
> containers. The patch is based on branch-2.8.3.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to