[jira] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly
[ https://issues.apache.org/jira/browse/YARN-11107 ] Xiping Zhang deleted comment on YARN-11107: - was (Author: zhangxiping): cc [~BilwaST] [~tangzhankun] > When NodeLabel is enabled for a YARN cluster, AM blacklist program does not > work properly > - > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > Attachments: YARN-11107-branch-2.9.2.001.patch, > YARN-11107-branch-3.3.0.001.patch > > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly
[ https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518555#comment-17518555 ] Xiping Zhang commented on YARN-11107: - cc [~leosun08] [~linyiqun] [~weichiu] [~hexiaoqiao] Could you help review this? Thanks. > When NodeLabel is enabled for a YARN cluster, AM blacklist program does not > work properly > - > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > Attachments: YARN-11107-branch-2.9.2.001.patch, > YARN-11107-branch-3.3.0.001.patch > > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly
[ https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518004#comment-17518004 ] Xiping Zhang commented on YARN-11107: - cc [~BilwaST] [~tangzhankun] > When NodeLabel is enabled for a YARN cluster, AM blacklist program does not > work properly > - > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > Attachments: YARN-11107-branch-2.9.2.001.patch, > YARN-11107-branch-3.3.0.001.patch > > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly
[ https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517921#comment-17517921 ] Xiping Zhang edited comment on YARN-11107 at 4/6/22 9:56 AM: - i think when NodeLabel is enabled, RM should consider the lable of the application when passing the number of NM to AM ,When the number of blacklisted nodes exceeds 33% of the total number of lable nodes, the AM releases NM in the blacklist. for DefaultAMSProcessor.java : {code:java} //代码占位符 final class DefaultAMSProcessor implements ApplicationMasterServiceProcessor { ... public void allocate(ApplicationAttemptId appAttemptId, AllocateRequest request, AllocateResponse response) throws YarnException { ... //Consider whether NodeLabel is enabled response.setNumClusterNodes(getScheduler().getNumClusterNodes()); ... } {code} was (Author: zhangxiping): I think when NodeLabel is enabled, RM should consider the lable of the application when passing the number of NM to AM ,When the number of blacklisted nodes exceeds 33% of the total number of lable nodes, the AM releases NM in the blacklist. for DefaultAMSProcessor.java : {code:java} //代码占位符 final class DefaultAMSProcessor implements ApplicationMasterServiceProcessor { ... public void allocate(ApplicationAttemptId appAttemptId, AllocateRequest request, AllocateResponse response) throws YarnException { ... //Consider whether NodeLabel is enabled response.setNumClusterNodes(getScheduler().getNumClusterNodes()); ... } {code} > When NodeLabel is enabled for a YARN cluster, AM blacklist program does not > work properly > - > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > Attachments: YARN-11107-branch-2.9.2.001.patch, > YARN-11107-branch-3.3.0.001.patch > > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly
[ https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiping Zhang updated YARN-11107: Summary: When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly (was: When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal) > When NodeLabel is enabled for a YARN cluster, AM blacklist program does not > work properly > - > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > Attachments: YARN-11107-branch-2.9.2.001.patch, > YARN-11107-branch-3.3.0.001.patch > > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal
[ https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiping Zhang updated YARN-11107: Attachment: YARN-11107-branch-3.3.0.001.patch > When NodeLabel is enabled for a YARN cluster, the blacklist feature is > abnormal > --- > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > Attachments: YARN-11107-branch-2.9.2.001.patch, > YARN-11107-branch-3.3.0.001.patch > > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
[ https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517945#comment-17517945 ] Juanjuan Tian commented on YARN-11108: --- [~wangda] could you help take a look? > Unexpected preemptions happen when hierarchy queues case > > > Key: YARN-11108 > URL: https://issues.apache.org/jira/browse/YARN-11108 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.9.2 >Reporter: Juanjuan Tian >Assignee: Juanjuan Tian >Priority: Major > Attachments: image-2022-04-06-16-29-57-871.png > > > Found unexpected preemptions happen when hierarchy queues case, the issue is > that a sub queue can accept resource more than used+pending, leading to other > queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and > preemption happen unexpectedly {color} > > 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor > (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: > NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: > TOTAL_PEN: > RESERVED: GAR: vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} > IDEAL_ASSIGNED: {color} > IDEAL_PREEMPT: ACTUAL_PREEMPT: vCores:0, ports:null> UNTOUCHABLE: > PREEMPTABLE: availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, > 24, 28}>]> BONUS_WEIGHT: -1.0 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
[ https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juanjuan Tian updated YARN-11108: -- Affects Version/s: 2.9.2 > Unexpected preemptions happen when hierarchy queues case > > > Key: YARN-11108 > URL: https://issues.apache.org/jira/browse/YARN-11108 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.9.2 >Reporter: Juanjuan Tian >Assignee: Juanjuan Tian >Priority: Major > Attachments: image-2022-04-06-16-29-57-871.png > > > Found unexpected preemptions happen when hierarchy queues case, the issue is > that a sub queue can accept resource more than used+pending, leading to other > queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and > preemption happen unexpectedly {color} > > 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor > (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: > NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: > TOTAL_PEN: > RESERVED: GAR: vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} > IDEAL_ASSIGNED: {color} > IDEAL_PREEMPT: ACTUAL_PREEMPT: vCores:0, ports:null> UNTOUCHABLE: > PREEMPTABLE: availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, > 24, 28}>]> BONUS_WEIGHT: -1.0 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
[ https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juanjuan Tian reassigned YARN-11108: - Assignee: Juanjuan Tian > Unexpected preemptions happen when hierarchy queues case > > > Key: YARN-11108 > URL: https://issues.apache.org/jira/browse/YARN-11108 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Juanjuan Tian >Assignee: Juanjuan Tian >Priority: Major > Attachments: image-2022-04-06-16-29-57-871.png > > > Found unexpected preemptions happen when hierarchy queues case, the issue is > that a sub queue can accept resource more than used+pending, leading to other > queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and > preemption happen unexpectedly {color} > > 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor > (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: > NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: > TOTAL_PEN: > RESERVED: GAR: vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} > IDEAL_ASSIGNED: {color} > IDEAL_PREEMPT: ACTUAL_PREEMPT: vCores:0, ports:null> UNTOUCHABLE: > PREEMPTABLE: availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, > 24, 28}>]> BONUS_WEIGHT: -1.0 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
[ https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935 ] Juanjuan Tian edited comment on YARN-11108 at 4/6/22 8:44 AM: --- This issue is caused by below, when calculating accepted, Resource.min (Resources.min(rc, clusterResource, avail, Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending : pendingDeductReserved)), idealAssigned))) is used, but Resources.componentwiseMin (Resources.componentwiseMin(avail, Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned) should be used, for example, if cluster resource is (32GB, 16cores), available is (2GB, 3cores), Resources. .subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned))) is (8GB, 2cores) after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its pending cpu number !image-2022-04-06-16-29-57-871.png! was (Author: jutia): This issue is caused by below, when calculating accepted, Resource.min (Resources.min(rc, clusterResource, avail, Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending : pendingDeductReserved)),idealAssigned is used, but Resources.componentwiseMin (Resources.componentwiseMin(avail, Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned) should be used, for example, it cluster resource is (32GB, 16cores), availialble is (2GB, 3cores), Resources. .subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned))) is (8GB, 2cores) after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its pending cpu number !image-2022-04-06-16-29-57-871.png! > Unexpected preemptions happen when hierarchy queues case > > > Key: YARN-11108 > URL: https://issues.apache.org/jira/browse/YARN-11108 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Juanjuan Tian >Priority: Major > Attachments: image-2022-04-06-16-29-57-871.png > > > Found unexpected preemptions happen when hierarchy queues case, the issue is > that a sub queue can accept resource more than used+pending, leading to other > queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and > preemption happen unexpectedly {color} > > 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor > (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: > NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: > TOTAL_PEN: > RESERVED: GAR: vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} > IDEAL_ASSIGNED: {color} > IDEAL_PREEMPT: ACTUAL_PREEMPT: vCores:0, ports:null> UNTOUCHABLE: > PREEMPTABLE: availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, > 24, 28}>]> BONUS_WEIGHT: -1.0 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
[ https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935 ] Juanjuan Tian edited comment on YARN-11108 at 4/6/22 8:43 AM: --- This issue is caused by below, when calculating accepted, Resource.min (Resources.min(rc, clusterResource, avail, Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending : pendingDeductReserved)),idealAssigned is used, but Resources.componentwiseMin (Resources.componentwiseMin(avail, Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned) should be used, for example, it cluster resource is (32GB, 16cores), availialble is (2GB, 3cores), Resources. .subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned))) is (8GB, 2cores) after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its pending cpu number !image-2022-04-06-16-29-57-871.png! was (Author: jutia): This issue is vcaused by below, when calculating accepted, Resource.min (Resources.min(rc, clusterResource, avail, Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending : pendingDeductReserved)),idealAssigned is used, but Resources.componentwiseMin (Resources.componentwiseMin(avail, Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned) is use !image-2022-04-06-16-29-57-871.png! > Unexpected preemptions happen when hierarchy queues case > > > Key: YARN-11108 > URL: https://issues.apache.org/jira/browse/YARN-11108 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Juanjuan Tian >Priority: Major > Attachments: image-2022-04-06-16-29-57-871.png > > > Found unexpected preemptions happen when hierarchy queues case, the issue is > that a sub queue can accept resource more than used+pending, leading to other > queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and > preemption happen unexpectedly {color} > > 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor > (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: > NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: > TOTAL_PEN: > RESERVED: GAR: vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} > IDEAL_ASSIGNED: {color} > IDEAL_PREEMPT: ACTUAL_PREEMPT: vCores:0, ports:null> UNTOUCHABLE: > PREEMPTABLE: availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, > 24, 28}>]> BONUS_WEIGHT: -1.0 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
[ https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935 ] Juanjuan Tian commented on YARN-11108: --- This issue is vcaused by below, when calculating accepted, Resource.min (Resources.min(rc, clusterResource, avail, Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending : pendingDeductReserved)),idealAssigned is used, but Resources.componentwiseMin (Resources.componentwiseMin(avail, Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? pending : pendingDeductReserved)), idealAssigned) is use !image-2022-04-06-16-29-57-871.png! > Unexpected preemptions happen when hierarchy queues case > > > Key: YARN-11108 > URL: https://issues.apache.org/jira/browse/YARN-11108 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Juanjuan Tian >Priority: Major > Attachments: image-2022-04-06-16-29-57-871.png > > > Found unexpected preemptions happen when hierarchy queues case, the issue is > that a sub queue can accept resource more than used+pending, leading to other > queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and > preemption happen unexpectedly {color} > > 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor > (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: > NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: > TOTAL_PEN: > RESERVED: GAR: vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} > IDEAL_ASSIGNED: {color} > IDEAL_PREEMPT: ACTUAL_PREEMPT: vCores:0, ports:null> UNTOUCHABLE: > PREEMPTABLE: availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, > 24, 28}>]> BONUS_WEIGHT: -1.0 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
[ https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juanjuan Tian updated YARN-11108: -- Attachment: image-2022-04-06-16-29-57-871.png > Unexpected preemptions happen when hierarchy queues case > > > Key: YARN-11108 > URL: https://issues.apache.org/jira/browse/YARN-11108 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Juanjuan Tian >Priority: Major > Attachments: image-2022-04-06-16-29-57-871.png > > > Found unexpected preemptions happen when hierarchy queues case, the issue is > that a sub queue can accept resource more than used+pending, leading to other > queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and > preemption happen unexpectedly {color} > > 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor > (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: > NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: > TOTAL_PEN: > RESERVED: GAR: vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} > IDEAL_ASSIGNED: {color} > IDEAL_PREEMPT: ACTUAL_PREEMPT: vCores:0, ports:null> UNTOUCHABLE: > PREEMPTABLE: availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, > 24, 28}>]> BONUS_WEIGHT: -1.0 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11108) Unexpected preemptions happen when hierarchy queues case
Juanjuan Tian created YARN-11108: - Summary: Unexpected preemptions happen when hierarchy queues case Key: YARN-11108 URL: https://issues.apache.org/jira/browse/YARN-11108 Project: Hadoop YARN Issue Type: Improvement Reporter: Juanjuan Tian Found unexpected preemptions happen when hierarchy queues case, the issue is that a sub queue can accept resource more than used+pending, leading to other queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and preemption happen unexpectedly {color} 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator: NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: ]> PEN: TOTAL_PEN: RESERVED: GAR: NORM: 0.3424696922302246{color:#de350b} IDEAL_ASSIGNED: {color} IDEAL_PREEMPT: ACTUAL_PREEMPT: UNTOUCHABLE: PREEMPTABLE: ]> BONUS_WEIGHT: -1.0 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11101) Fix TestYarnConfigurationFields
[ https://issues.apache.org/jira/browse/YARN-11101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved YARN-11101. -- Resolution: Duplicate > Fix TestYarnConfigurationFields > --- > > Key: YARN-11101 > URL: https://issues.apache.org/jira/browse/YARN-11101 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, newbie >Reporter: Akira Ajisaka >Priority: Major > > yarn.resourcemanager.node-labels.am.default-node-label-expression is missing > in yarn-default.xml. > {noformat} > [INFO] Running org.apache.hadoop.yarn.conf.TestYarnConfigurationFields > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.533 > s <<< FAILURE! - in org.apache.hadoop.yarn.conf.TestYarnConfigurationFields > [ERROR] testCompareConfigurationClassAgainstXml Time elapsed: 0.082 s <<< > FAILURE! > java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration > has 1 variables missing in yarn-default.xml Entries: > yarn.resourcemanager.node-labels.am.default-node-label-expression > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml(TestConfigurationFieldsBase.java:493) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11101) Fix TestYarnConfigurationFields
[ https://issues.apache.org/jira/browse/YARN-11101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517924#comment-17517924 ] Akira Ajisaka commented on YARN-11101: -- Thank you [~zuston] for the information. I'll close this as duplicate. > Fix TestYarnConfigurationFields > --- > > Key: YARN-11101 > URL: https://issues.apache.org/jira/browse/YARN-11101 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, newbie >Reporter: Akira Ajisaka >Priority: Major > > yarn.resourcemanager.node-labels.am.default-node-label-expression is missing > in yarn-default.xml. > {noformat} > [INFO] Running org.apache.hadoop.yarn.conf.TestYarnConfigurationFields > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.533 > s <<< FAILURE! - in org.apache.hadoop.yarn.conf.TestYarnConfigurationFields > [ERROR] testCompareConfigurationClassAgainstXml Time elapsed: 0.082 s <<< > FAILURE! > java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration > has 1 variables missing in yarn-default.xml Entries: > yarn.resourcemanager.node-labels.am.default-node-label-expression > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml(TestConfigurationFieldsBase.java:493) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal
[ https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517921#comment-17517921 ] Xiping Zhang commented on YARN-11107: - I think when NodeLabel is enabled, RM should consider the lable of the application when passing the number of NM to AM ,When the number of blacklisted nodes exceeds 33% of the total number of lable nodes, the AM releases NM in the blacklist. for DefaultAMSProcessor.java : {code:java} //代码占位符 final class DefaultAMSProcessor implements ApplicationMasterServiceProcessor { ... public void allocate(ApplicationAttemptId appAttemptId, AllocateRequest request, AllocateResponse response) throws YarnException { ... //Consider whether NodeLabel is enabled response.setNumClusterNodes(getScheduler().getNumClusterNodes()); ... } {code} > When NodeLabel is enabled for a YARN cluster, the blacklist feature is > abnormal > --- > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal
[ https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiping Zhang updated YARN-11107: Description: Yarn NodeLabel is enabled in the production environment. We encountered a application AM that blacklisted all NMS corresponding to the lable in the queue, and other application in the queue cannot apply for computing resources. We found that RM printed a lot of logs "Trying to fulfill reservation for application..." (was: Yarn NodeLabel is enabled in the production environment. During application running, an AM task blacklists all NMs corresponding to the Lable in the queue, and other application in the queue cannot apply for computing resources. We found that RM printed a lot of logs "Trying to fulfill reservation for application...") > When NodeLabel is enabled for a YARN cluster, the blacklist feature is > abnormal > --- > > Key: YARN-11107 > URL: https://issues.apache.org/jira/browse/YARN-11107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2, 3.3.0 >Reporter: Xiping Zhang >Priority: Major > > Yarn NodeLabel is enabled in the production environment. We encountered a > application AM that blacklisted all NMS corresponding to the lable in the > queue, and other application in the queue cannot apply for computing > resources. We found that RM printed a lot of logs "Trying to fulfill > reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal
Xiping Zhang created YARN-11107: --- Summary: When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal Key: YARN-11107 URL: https://issues.apache.org/jira/browse/YARN-11107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.3.0, 2.9.2 Reporter: Xiping Zhang Yarn NodeLabel is enabled in the production environment. During application running, an AM task blacklists all NMs corresponding to the Lable in the queue, and other application in the queue cannot apply for computing resources. We found that RM printed a lot of logs "Trying to fulfill reservation for application..." -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11101) Fix TestYarnConfigurationFields
[ https://issues.apache.org/jira/browse/YARN-11101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517854#comment-17517854 ] Junfan Zhang commented on YARN-11101: - Sorry. This has been fixed in [https://github.com/apache/hadoop/pull/4121 |https://github.com/apache/hadoop/pull/4121] [~aajisaka] [ |https://github.com/apache/hadoop/pull/4121] > Fix TestYarnConfigurationFields > --- > > Key: YARN-11101 > URL: https://issues.apache.org/jira/browse/YARN-11101 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, newbie >Reporter: Akira Ajisaka >Priority: Major > > yarn.resourcemanager.node-labels.am.default-node-label-expression is missing > in yarn-default.xml. > {noformat} > [INFO] Running org.apache.hadoop.yarn.conf.TestYarnConfigurationFields > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.533 > s <<< FAILURE! - in org.apache.hadoop.yarn.conf.TestYarnConfigurationFields > [ERROR] testCompareConfigurationClassAgainstXml Time elapsed: 0.082 s <<< > FAILURE! > java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration > has 1 variables missing in yarn-default.xml Entries: > yarn.resourcemanager.node-labels.am.default-node-label-expression > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml(TestConfigurationFieldsBase.java:493) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org