[ https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273267#comment-15273267 ]
Wangda Tan commented on YARN-5039: ---------------------------------- It seems some nodes are in decommissioning state from the [log|https://issues.apache.org/jira/secure/attachment/12802294/yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz], probably we have some bugs to correctly show decommissioning nodes and related resources on web UI. {code} 2016-05-04 21:00:10,182 INFO org.apache.hadoop.yarn.server.resourcemanager.DecommissioningNodesWatcher (IPC Server handler 44 on 8025): Decommissioning Nodes: ip-10-12-41-126.us-west-2.compute.internal 63833s fresh:41194s containers: 0 READY ip-10-12-36-61.us-west-2.compute.internal 62964s fresh:41194s containers: 0 READY ip-10-12-46-96.us-west-2.compute.internal 98343s fresh:98343s containers: 0 READY ... {code} IIRC, scheduler will not assign containers to decommissioning nodes, that could be the reason why your applications stay at ACCEPTED state. > Applications ACCEPTED but not starting > -------------------------------------- > > Key: YARN-5039 > URL: https://issues.apache.org/jira/browse/YARN-5039 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.7.2 > Reporter: Miles Crawford > Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot > 2016-05-04 at 2.41.22 PM.png, queue-config.log, > resource-manager-application-starts.log.gz, > yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz > > > Often when we submit applications to an incompletely utilized cluster, they > sit, unable to start for no apparent reason. > There are multiple nodes in the cluster with available resources, but the > resourcemanger logs show that scheduling is being skipped. The scheduling is > skipped because the application itself has reserved the node? I'm not sure > how to interpret this log output: > {code} > 2016-05-04 20:19:21,315 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Trying to fulfill reservation for > application application_1462291866507_0025 on node: > ip-10-12-43-54.us-west-2.compute.internal:8041 > 2016-05-04 20:19:21,316 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Reserved container > application=application_1462291866507_0025 resource=<memory:50688, vCores:1> > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, > absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 > usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, > vCores:33> cluster=<memory:2658304, vCores:704> > 2016-05-04 20:19:21,316 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Skipping scheduling since node > ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application > appattempt_1462291866507_0025_000001 > 2016-05-04 20:19:22,232 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Trying to fulfill reservation for > application application_1462291866507_0025 on node: > ip-10-12-43-53.us-west-2.compute.internal:8041 > 2016-05-04 20:19:22,232 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Reserved container > application=application_1462291866507_0025 resource=<memory:50688, vCores:1> > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, > absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 > usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, > vCores:33> cluster=<memory:2658304, vCores:704> > 2016-05-04 20:19:22,232 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Skipping scheduling since node > ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application > appattempt_1462291866507_0025_000001 > 2016-05-04 20:19:22,316 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Trying to fulfill reservation for > application application_1462291866507_0025 on node: > ip-10-12-43-54.us-west-2.compute.internal:8041 > 2016-05-04 20:19:22,316 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Reserved container > application=application_1462291866507_0025 resource=<memory:50688, vCores:1> > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, > absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 > usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, > vCores:33> cluster=<memory:2658304, vCores:704> > 2016-05-04 20:19:22,316 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Skipping scheduling since node > ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application > appattempt_1462291866507_0025_000001 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org