[ https://issues.apache.org/jira/browse/YARN-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238718#comment-17238718 ]
Aadithya commented on YARN-9449: -------------------------------- Hi, Can any one provide me the solution or work around for this issue.This is frequently occurring in EMR clusters where node labels are enabled. I appreciate any help provided. > Non-exclusive labels can create reservation loop on cluster without unlabeled > node > ---------------------------------------------------------------------------------- > > Key: YARN-9449 > URL: https://issues.apache.org/jira/browse/YARN-9449 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.8.5 > Reporter: Brandon Scheller > Priority: Major > > https://issues.apache.org/jira/browse/YARN-5342 Added a counter to Yarn so > that unscheduled resource requests were attempted to be scheduled on > unlabeled nodes first. > This counter is reset only when an attempt to schedule happens on an > unlabeled node. > On hadoop clusters with only labeled nodes, this counter can never be reset > and therefore it will block skipping that node. > Because the node will not be skipped, it creates the loop shown below in the > Yarn RM logs. > This can block scheduling of a spark executor for example and cause the spark > application to get stuck. > > {{_2019-02-18 23:54:22,591 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (ResourceManager Event Processor): container_1550533628872_0003_01_000023 > Container Transitioned from NEW to RESERVED 2019-02-18 23:54:22,591 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator > (ResourceManager Event Processor): Reserved container > application=application_1550533628872_0003 resource=<memory:11264, vCores:1> > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3 > cluster=<memory:24576, vCores:16> 2019-02-18 23:54:22,592 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue > (ResourceManager Event Processor): assignedContainer queue=root > usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> > cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,592 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Trying to fulfill reservation for > application application_1550533628872_0003 on node: > ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:23,592 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp > (ResourceManager Event Processor): Application > application_1550533628872_0003 unreserved on node host: > ip-10-0-0-122.ec2.internal:8041 #containers=1 available=<memory:1024, > vCores:7> used=<memory:11264, vCores:1>, currently has 0 at priority 1; > currentReservation <memory:0, vCores:0> on node-label=LABELED 2019-02-18 > 23:54:23,593 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (ResourceManager Event Processor): container_1550533628872_0003_01_000024 > Container Transitioned from NEW to RESERVED 2019-02-18 23:54:23,593 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator > (ResourceManager Event Processor): Reserved container > application=application_1550533628872_0003 resource=<memory:11264, vCores:1> > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3 > cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,593 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue > (ResourceManager Event Processor): assignedContainer queue=root > usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> > cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,593 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Trying to fulfill reservation for > application application_1550533628872_0003 on node: > ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:24,593 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp > (ResourceManager Event Processor): Application > application_1550533628872_0003 unreserved on node host: > ip-10-0-0-122.ec2.internal:8041 #containers=1 available=<memory:1024, > vCores:7> used=<memory:11264, vCores:1>, currently has 0 at priority 1; > currentReservation <memory:0, vCores:0> on node-label=LABELED 2019-02-18 > 23:54:24,594 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (ResourceManager Event Processor): container_1550533628872_0003_01_000025 > Container Transitioned from NEW to RESERVED 2019-02-18 23:54:24,594 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator > (ResourceManager Event Processor): Reserved container > application=application_1550533628872_0003 resource=<memory:11264, vCores:1> > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3 > cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,594 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue > (ResourceManager Event Processor): assignedContainer queue=root > usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> > cluster=<memory:24576, vCores:16> 2019-02-18 23:54:25,594 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Trying to fulfill reservation for > application application_1550533628872_0003 on node: > ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:25,595 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp > (ResourceManager Event Processor): Application > application_1550533628872_0003 unreserved on node host: > ip-10-0-0-122.ec2.internal:8041 #containers=1 available=<memory:1024, > vCores:7> used=<memory:11264, vCores:1>, currently has 0 at priority 1; > currentReservation <memory:0, vCores:0> on node-label=LABELED_}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org