[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709832#comment-15709832
 ] 

Eric Payne commented on YARN-5889:
----------------------------------

{quote}
bq. It seems like this should be longer than 1 ms.
It could be possible that containers are released and created very fast in a 
big cluster.
{quote}

[~sunilg], I now realize that with this design, the {{preComputedUserLimit}} 
cache will become out of date very quickly if the 
{{ComputeUserLimitAsyncThread}} thread is not run in a very tight loop. Even 
with that, {{preComputedUserLimit}} could still be out of date at the moment 
the scheduler needs to fill a large request.

On the other hand, with this design the user limit resource is being calculated 
a lot more often than it is currently. Currently, it is only being calculated 
during the scheduler loop, and only then for apps that are asking for 
resources. However, this design calculates it twice every millisecond (once 
with partition exclusivity and once without). If a cluster is not full and has 
mostly apps with long-running containers, then this is being calculated 
thousands of times when it doesn't need to be.

Instead could we add a boolean flag to {{UserToPartitionRecord}}? This flag 
would be set when a container is allocated or releaseed for an app from that 
user. Then, whenever {{getComputedUserLimit}} is called, if the flag is set, it 
calls {{computeUserLimit}} and clears the flag. What do you think?


> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to