[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated YARN-8232: Fix Version/s: 3.0.3 > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 2.8.5 > > Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch, > YARN-8232.002.patch, YARN-8232.003.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes some problems. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8232: - Fix Version/s: 2.8.5 2.9.2 2.10.0 Thanks, [~ziqian hu]! We recently ran into the same issue on 2.8 as well, so I committed this to branch-3.0, branch-2, branch-2.9, and branch-2.8. > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 2.8.5 > > Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch, > YARN-8232.002.patch, YARN-8232.003.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes some problems. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Ziqian updated YARN-8232: Attachment: YARN-8232.003.patch > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch, > YARN-8232.002.patch, YARN-8232.003.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes some problems. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Ziqian updated YARN-8232: Attachment: YARN-8232.002.patch > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch, > YARN-8232.002.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes some problems. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Ziqian updated YARN-8232: Attachment: YARN-8232-branch-2.8.3.001.patch YARN-8232.001.patch > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes some problems. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Ziqian updated YARN-8232: Attachment: (was: YARN_8232.patch) > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes some problems. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Ziqian updated YARN-8232: Description: RMContainer has a member variable queuename to store which queue the container belongs to. When RM HA happens and RMContainers are recovered by scheduler based on NM reports, the queue name isn't recovered and always be null. This situation causes some problems. Here is a case in preemption. Preemption uses container's queue name to deduct preemptable resources when we use more than one preempt selector, (for example, enable intra-queue preemption,) . The detail is in {code:java} CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} If the contain's queue name is null, this function will throw a YarnRuntimeException because it tries to get the container's TempQueuePerPartition and the preemption fails. Our patch solved this problem by setting container queue name when recover containers. The patch is based on branch-2.8.3. was: RMContainer has a member variable queuename to store which queue the container belongs to. When RM HA happens and RMContainers are recovered by scheduler based on NM reports, the queue name isn't recovered and always be null. This situation causes many problem. Here is a case in preemption. Preemption uses container's queue name to deduct preemptable resources when we use more than one preempt selector, (for example, enable intra-queue preemption,) . The detail is in {code:java} CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} If the contain's queue name is null, this function will throw a YarnRuntimeException because it tries to get the container's TempQueuePerPartition and the preemption fails. Our patch solved this problem by setting container queue name when recover containers. The patch is based on branch-2.8.3. > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Priority: Major > Attachments: YARN_8232.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes some problems. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Ziqian updated YARN-8232: Flags: Patch Attachment: YARN_8232.patch Description: RMContainer has a member variable queuename to store which queue the container belongs to. When RM HA happens and RMContainers are recovered by scheduler based on NM reports, the queue name isn't recovered and always be null. This situation causes many problem. Here is a case in preemption. Preemption uses container's queue name to deduct preemptable resources when we use more than one preempt selector, (for example, enable intra-queue preemption,) . The detail is in {code:java} CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} If the contain's queue name is null, this function will throw a YarnRuntimeException because it tries to get the container's TempQueuePerPartition and the preemption fails. Our patch solved this problem by setting container queue name when recover containers. The patch is based on branch-2.8.3. was: RMContainer has a member variable queuename to store which queue the container belongs to. Preemption uses this information to deduct preemptable resources. When RM HA happens and RMContainers are recovered by scheduler based on NM reports, we didn't set queue name information to RMContainers. At this situation, when we use more than one preempt selector, (for example, enable intra-queue preemption,) the {code:java} CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} will throw a YarnRuntimeException because it tries to get the container's TempQueuePerPartition where container's queue name is null. Our patch solved this problem by setting container queue name when recover containers. The patch is based on branch-2.8.3. Summary: RMContainer lost queue name when RM HA happens (was: RMContainer lost Queue name when recovered by RM) > RMContainer lost queue name when RM HA happens > -- > > Key: YARN-8232 > URL: https://issues.apache.org/jira/browse/YARN-8232 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Priority: Major > Attachments: YARN_8232.patch > > > RMContainer has a member variable queuename to store which queue the > container belongs to. When RM HA happens and RMContainers are recovered by > scheduler based on NM reports, the queue name isn't recovered and always be > null. > This situation causes many problem. Here is a case in preemption. Preemption > uses container's queue name to deduct preemptable resources when we use more > than one preempt selector, (for example, enable intra-queue preemption,) . > The detail is in > {code:java} > CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code} > If the contain's queue name is null, this function will throw a > YarnRuntimeException because it tries to get the container's > TempQueuePerPartition and the preemption fails. > Our patch solved this problem by setting container queue name when recover > containers. The patch is based on branch-2.8.3. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org