[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4606:
-----------------------------
    Description: 
Currently, if all applications belong to same user in LeafQueue are pending 
(caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
is an active user. This could lead to starvation of active applications, for 
example:
- App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
user3)/app4(belongs to user4) are pending
- ActiveUsersManager returns #active-users=4
- However, there're only two users (user1/user2) are able to allocate new 
resources. So computed user-limit-resource could be lower than expected.

  was:
Encountered while studying behaviour fairness with UserLimitPercent and 
UserLimitFactor during following test:
Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 
UserLimitFactor=32, FairOrderingPolicy only. Encountered a application starving 
situation where 33 application (190 apps completed out of 761 apps, queue can 
345 containers) are running with total of 45 containers running, and that 12 
extra only one app(the app was having around 18000 tasks) , all other apps were 
having AM running only no other containers were given any apps. After that app 
finished, there were 32 AMs that kept running without any containers for task 
being launched
GridMix was run with following settings:
gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, 
gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, 
gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, 
mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, 
gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, 
gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver
 With Users file containing 4 users for RoundRobinUserResolver


> CapacityScheduler: applications could get starved because #activeUsers 
> considers pending apps
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to