[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

Miles Crawford (JIRA) Wed, 11 May 2016 09:18:45 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280357#comment-15280357
 ]


Miles Crawford commented on YARN-5039:
--------------------------------------

Okay, wow that is wonderful to know. We have a highly variable workload and 
make almost no use of HDFS, so we use a tiny number of CORE nodes and remove 
TASK nodes when things are idle to save on costs. We do not use spot instances 
at all (because of https://issues.apache.org/jira/browse/SPARK-14209)

I cannot seem to find any mention of this behavior in the EMR documentation, so 
it's a bit of a blindside.

Additionally, the Node Labels page of the Hadoop UI does not distinguish 
between spot and task, so I wasn't even aware labeling was going on.

I guess things are working as designed, so I'm sorry to take up all your time. 
Thanks very much for helping. I think I'll follow up with AWS and request a 
documentation fix.

> Applications ACCEPTED but not starting
> --------------------------------------
>
>                 Key: YARN-5039
>                 URL: https://issues.apache.org/jira/browse/YARN-5039
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Miles Crawford
>         Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, capacity-scheduler-at-debug.log.gz, 
> queue-config.log, resource-manager-application-starts.log.gz, 
> whole-scheduler-at-debug.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they 
> sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the 
> resourcemanger logs show that scheduling is being skipped. The scheduling is 
> skipped because the application itself has reserved the node? I'm not sure 
> how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource=<memory:50688, vCores:1> 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, 
> vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource=<memory:50688, vCores:1> 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, 
> vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource=<memory:50688, vCores:1> 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, 
> vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_000001
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

Reply via email to