[ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063156#comment-14063156
 ] 

Wangda Tan commented on YARN-2297:
----------------------------------

Hi [~chris.douglas],
Thanks for your jumping in,
bq. Does this occur when the absolute guaranteed capacity of a queue is smaller 
than the minimum container size?
This can be happened when 
(used_capacity_of_a_queue + newly_allocated_container_resource > 
guaranteed_resource_of_a_queue) && (used_capacity_of_a_queue < 
guaranteed_resource_of_a_queue),
So I propose to change
{code}
while (toBePreempt > 0):
  foreach application:
    foreach container:
      if (toBePreempt > 0):
        do preemption
{code}
To
{code}
while (toBePreempt > 0):
  foreach application:
    foreach container:
      if (toBePreempt > 0) and (container.resource < toBePreempt * 2):
        do preemption
{code}
To make sure a container is not preempted too aggressive. 
Does this answered your question?

Thanks,
Wangda

> Preemption can hang in corner case by not allowing any task container to 
> proceed.
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-2297
>                 URL: https://issues.apache.org/jira/browse/YARN-2297
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.5.0
>            Reporter: Tassapol Athiapinya
>            Assignee: Wangda Tan
>            Priority: Critical
>
> Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
> container can run.
> h3. queue configuration
> Queue A/B has 1% and 99% respectively. 
> No max capacity.
> h3. scenario
> Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
> 1 user.
> Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
> Occupy entire cluster.
> Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
> Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
> No task of either app can proceed. 
> h3. commands
> /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
> "-Dmapreduce.map.memory.mb=2000" 
> "-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" 
> "-Dmapreduce.randomtextwriter.bytespermap=2147483648" 
> "-Dmapreduce.job.queuename=A" "-Dmapreduce.map.maxattempts=100" 
> "-Dmapreduce.am.max-attempts=1" "-Dyarn.app.mapreduce.am.resource.mb=2000" 
> "-Dmapreduce.map.java.opts=-Xmx1800M" 
> "-Dmapreduce.randomtextwriter.mapsperhost=1" 
> "-Dmapreduce.randomtextwriter.totalbytes=2147483648" dir1
> /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
> "-Dmapreduce.map.memory.mb=2000" 
> "-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" 
> "-Dmapreduce.job.queuename=B" "-Dmapreduce.map.maxattempts=100" 
> "-Dmapreduce.am.max-attempts=1" "-Dyarn.app.mapreduce.am.resource.mb=2000" 
> "-Dmapreduce.map.java.opts=-Xmx1800M" -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to