[ https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949586#comment-14949586 ]
Wangda Tan commented on YARN-4140: ---------------------------------- Thanks for update, [~bibinchundatt], patch looks good, pending Jenkins. > RM container allocation delayed incase of app submitted to Nodelabel partition > ------------------------------------------------------------------------------ > > Key: YARN-4140 > URL: https://issues.apache.org/jira/browse/YARN-4140 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, > 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, > 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, > 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, > 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch > > > Trying to run application on Nodelabel partition I found that the > application execution time is delayed by 5 – 10 min for 500 containers . > Total 3 machines 2 machines were in same partition and app submitted to same. > After enabling debug was able to find the below > # From AM the container ask is for OFF-SWITCH > # RM allocating all containers to NODE_LOCAL as shown in logs below. > # So since I was having about 500 containers time taken was about – 6 minutes > to allocate 1st map after AM allocation. > # Tested with about 1K maps using PI job took 17 minutes to allocate next > container after AM allocation > Once 500 container allocation on NODE_LOCAL is done the next container > allocation is done on OFF_SWITCH > {code} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: > /default-rack, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: *, Relax > Locality: true, Node Label Expression: 3} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: > host-10-19-92-143, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: > host-10-19-92-117, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=<memory:0, vCores:0>, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, > vCores:0>, NODE_LOCAL > {code} > > {code} > 2015-09-09 14:35:45,467 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=<memory:0, vCores:0>, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, > vCores:0>, NODE_LOCAL > 2015-09-09 14:35:45,831 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=<memory:0, vCores:0>, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, > vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,469 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=<memory:0, vCores:0>, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, > vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,832 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=<memory:0, vCores:0>, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, > vCores:0>, NODE_LOCAL > {code} > {code} > dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> > cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep > "root.b.b1" | wc -l > 500 > {code} > > (Consumes about 6 minutes) > -- This message was sent by Atlassian JIRA (v6.3.4#6332)