[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496239#comment-14496239
 ] 

Thomas Graves commented on YARN-3434:
-------------------------------------

So I had considered putting it in the ResourceLimits but ResourceLimits seems 
to be more of a queue level thing to me (not a user level). For instance 
parentQueue passes this into leafQueue. ParentQueue cares nothing about user 
limits.  If you stored it there you would either need to track the user it was 
for or track for all users. ResourceLimits get updated when nodes are added and 
removed.  We don't need to compute a particular user limit when that happens.   
 So it would then be out of date or we change to update it when that happens, 
but that to me is fairly large change and not really needed.

The user limit calculation are lower down and recomputed per user, per 
application, per current request regularly and putting this into the global 
based on how being calculated and used didn't make sense to me. All you would 
be using it for is passing it down to assignContainer and then it would be out 
of date.  If someone else started looking at that value assuming it was up to 
date then it would be wrong (unless of course we started updating it as stated 
above).  But it would only be for a single user, not all users unless again we 
changed to calculate for every user whenever something changed. That seems a 
bit excessive.

You are correct that needToUnreserve could go away.  I started out on 2.6 which 
didn't have our changes and I could have removed it when I added in 
amountNeededUnreserve.  If we were to store it in the global ResourceLimit then 
yes the entire LimitsInfo can go away including shouldContinue as you would 
fall back to use the boolean return from each function.   But again based on my 
above comments I'm not sure ResourceLimit is the correct place to put this.

I just noticed that we are already keeping the userLimit in the User class, 
that would be another option.  But again I think we need to make it clear about 
what it is. This particular check is done per application, per user based on 
the current requested Resource.  The value stored that wouldn't necessarily 
apply to all the users applications since the resource request size could be 
different.  

thoughts or is there something I'm missing about ResourceLimits?

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-3434
>                 URL: https://issues.apache.org/jira/browse/YARN-3434
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to