[jira] [Commented] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved or stuck

Karam Singh (JIRA) Tue, 19 Jan 2016 01:46:49 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106507#comment-15106507
 ]


Karam Singh commented on YARN-4606:
-----------------------------------

>From offline discussion with [~wangda]:
After looked at log & code, I think I understand what happened:
The root cause is: we shouldn't activate application when it's in pending 
state. This is not a new issue, at least branch-2.6 contains this issue.
This leads to #active-users in a queue increased, but new added active user 
cannot get resource (because application is in pending state) and old user hits 
user-limit (new added user lowers user-limits).


> Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor 
> in queue leads to situation where it appears that applications in queue are 
> getting starved or stuck
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>
> Encountered while studying behaviour fairness with UserLimitPercent and 
> UserLimitFactor during following test:
> Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 
> UserLimitFactor=32, FairOrderingPolicy only. Encountered a application 
> starving situation where 33 application (190 apps completed out of 761 apps, 
> queue can 345 containers) are running with total of 45 containers running, 
> and that 12 extra only one app(the app was having around 18000 tasks) , all 
> other apps were having AM running only no other containers were given any 
> apps. After that app finished, there were 32 AMs that kept running without 
> any containers for task being launched
> GridMix was run with following settings:
> gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, 
> gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, 
> gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, 
> mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, 
> gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, 
> gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver
>  With Users file containing 4 users for RoundRobinUserResolver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved or stuck

Reply via email to