Prabhu Joseph created YARN-7685:
-----------------------------------

             Summary: Preemption does not happen when a node label partition is 
fully utilized
                 Key: YARN-7685
                 URL: https://issues.apache.org/jira/browse/YARN-7685
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacity scheduler
    Affects Versions: 2.7.3
            Reporter: Prabhu Joseph
         Attachments: Screen Shot 2017-12-27 at 3.28.13 PM.png, Screen Shot 
2017-12-27 at 3.28.20 PM.png, Screen Shot 2017-12-27 at 3.28.32 PM.png, Screen 
Shot 2017-12-27 at 3.31.42 PM.png, capacity-scheduler.xml

Have two queues default and tkgrid and two node labels default 
(exclusivity=true) and tkgrid (exclusivity=false)

default queue = capacity 15% and max capacity is 100% and default node label 
expression is tkgrid
tkgrid queue = capacity 85% and max capacity is 100% and default node label 
expression is default

When default queue has occupied the complete node label tkgrid and then a new 
job submitted into tkgrid queue with node label expression tkgrid will wait in 
ACCEPTED state forever as there is no space in tkgrid partition for the 
Application Master. Preemption does not kick in for this scenario.

Attached capacity-scheduler.xml, RM UI, Nodes and Node Labels screenshot.

{code}
Repro Steps:
[yarn@bigdata3 root]$ yarn cluster  --list-node-labels 
Node Labels: <tkgrid:exclusivity=false>

Job 1 submitted into default queue which has utilized complete tkgrid node 
label partition:

yarn jar 
/usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
  -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args 
2h -timeout 7200000 -jar 
/usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
 -queue default  -num_containers 20

Job 2 submitted into tkgrid queue with AM node label expression as tkgrid which 
stays at ACCEPTED state forever

yarn jar 
/usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
  -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args 
2h -timeout 7200000 -jar 
/usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
 -queue tkgrid  -node_label_expression tkgrid  -num_containers 20


17/12/27 09:31:48 INFO distributedshell.Client: Got application report from ASM 
for, appId=5, clientToAMToken=null, appDiagnostics=[Wed Dec 27 09:31:39 +0000 
2017] Application is Activated, waiting for resources to be assigned for AM.  
Details : AM Partition = tkgrid ; Partition Resource = <memory:35840, 
vCores:56> ; Queue's Absolute capacity = 85.0 % ; Queue's Absolute used 
capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; , 
appMasterHost=N/A, appQueue=tkgrid, appMasterRpcPort=-1, 
appStartTime=1514367099792, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://bigdata3.openstacklocal:8088/proxy/application_1514366265793_0005/,
 appUser=yarn

{code}











--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to