[jira] [Comment Edited] (YARN-6538) Inter Queue preemption is not happening when DRF is configured

Michael Zeoli (Jira) Tue, 23 Mar 2021 09:04:05 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307191#comment-17307191
 ]


Michael Zeoli edited comment on YARN-6538 at 3/23/21, 4:03 PM:
---------------------------------------------------------------

Eric - thanks for the response and apologies for the absence.  Currently we 
have not been able to reproduce outside of our particular pipeline, though we 
stopped in earnest once our platform vendor indicated they were able to 
reproduce with a purpose-built MR job  (we are currently working the issue with 
them).  I will try to get details.

Essentially what we see is a single job (in lq1) with several thousand pending 
containers taking the entire cluster (expected, via dynamic allocation).  When 
a second job enters lq2, it fails to receive executors despite having a 
guaranteed minimum capacity of 17% (approx 4 cores..   28 * 0.95 * 0.17).  On 
occasion it also fails to receive an AM.  If a third job enters lq3 at this 
point, it also fails to receive executors.  The jobs continue to starve until 
the first job begins attriting resources as pending containers fall to zero.   

 

YARN Resources  (4 NM's, so 280 GiB / 28c total YARN resources)
 * yarn.nodemanager.resource.cpu-vcores = 7
 * yarn.scheduler.maximum-allocation-vcores = 7
 * yarn.nodemanager.resource.memory-mb = 70 GiB
 * yarn.scheduler.maximum-allocation-mb = 40 GiB

 
  
 Queue configuration  (note that only lq1, lq2 and lq3 are used in the current 
tests)
 * root.default cap = 5%
 * root.tek cap = 95%
 * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each
 * root.tek.lq5 .lq6 cap = 16% each

 

For all lqN (leaf queues):  
 * Minimum User Limit = 25%
 * User Limit Factor = 100  (intentionally set high to allow user to exceed 
queue capacity when idle capacity exists)
 * max cap = 100%
 * max AM res limit = 20%
 * inter / intra queue preemption: Enabled
 * ordering policy = Fair

 

Spark config  (this is our default spark config, though some of the spark jobs 
in the pipelines we're testing set executor mem and overhead mem higher to 
support more memory intensive work.  Our work is memory constrained, and 
additional cores per executor have never yielded more optimal throughput).
 * spark.executor.cores=1
 * spark.executor.memory=5G
 * spark.driver.memory=4G
 * spark.driver.maxResultSize=2G
 * spark.executor.memoryOverhead=1024
 * spark.dynamicAllocation.enabled = true

 


was (Author: novaboy):
Eric - thanks for the response and apologies for the absence.  Currently we 
have not been able to reproduce outside of our particular pipeline, though we 
stopped in earnest once our platform vendor indicated they were able to 
reproduce with a purpose-built MR job  (we are currently working the issue with 
them).  I will try to get details.

Essentially what we see is a single job (in lq1) with several thousand pending 
containers taking the entire cluster (expected, via dynamic allocation).  When 
a second job enters lq2, it fails to receive executors despite having a 
guaranteed minimum capacity of 17% (approx 4 cores..   28 * 0.95 * 0.17).  On 
occasion it also fails to receive an AM.  If a third job enters lq3 at this 
point, it also fails to receive executors.  The jobs continue to starve until 
the first job begins attriting resources as pending containers fall to zero.   

 

YARN Resources  (4 NM's, so 280 GiB / 28c total YARN resources)
 * yarn.nodemanager.resource.cpu-vcores = 7
 * yarn.scheduler.maximum-allocation-vcores = 7
 * yarn.nodemanager.resource.memory-mb = 70 GiB
 * yarn.scheduler.maximum-allocation-mb = 40 GiB

 
 
Queue configuration  (note that only lq1, lq2 and lq3 are used in the current 
tests)
 * root.default cap = 5%
 * root.tek cap = 95%
 * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each
 * root.tek.lq5 .lq6 cap = 16% each

 

For all lqN (leaf queues):  
 * Minimum User Limit = 25%
 * User Limit Factor = 100  (intentionally set high to allow user to exceed 
queue capacity when idle capacity exists)
 * max cap = 100%
 * max AM res limit = 20%
 * inter / intra queue preemption: Enabled
 * ordering policy = Fair

 

Spark config
 * spark.executor.cores=1
 * spark.executor.memory=5G
 * spark.driver.memory=4G
 * spark.driver.maxResultSize=2G
 * spark.executor.memoryOverhead=1024
 * spark.dynamicAllocation.enabled = true

 

> Inter Queue preemption is not happening when DRF is configured
> --------------------------------------------------------------
>
>                 Key: YARN-6538
>                 URL: https://issues.apache.org/jira/browse/YARN-6538
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, scheduler preemption
>    Affects Versions: 2.8.0
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Major
>
> Cluster capacity of <memory:3TB, vCores:168>. Here memory is more and vcores 
> are less. If applications have more demand, vcores might be exhausted. 
> Inter queue preemption ideally has to be kicked in once vcores is over 
> utilized. However preemption is not happening.
> Analysis:
> In {{AbstractPreemptableResourceCalculator.computeFixpointAllocation}}, 
> {code}
>     // assign all cluster resources until no more demand, or no resources are
>     // left
>     while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant,
>         unassigned, Resources.none())) {
> {code}
>  will loop even when vcores are 0 (because memory is still +ve). Hence we are 
> having more vcores in idealAssigned which cause no-preemption cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6538) Inter Queue preemption is not happening when DRF is configured

Reply via email to