[
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711647#comment-15711647
]
Sunil G commented on YARN-5889:
-------------------------------
Thanks [~eepayne]
Yes. for scheduler, 1ms is also smaller. It was a tradeoff to see the
performance gain and its impact. With SLS test, i could be see good improvement
in allocation speed.
Now to bridge the gap, there are 2 cases
- How to make sure that every allocation gets correct and accurate user-limit
value given computation happens at 1ms?
- In a lousy cluster, how can we save CPU cycles to prevent too much of
unnecessary computations?
Yes, an ideal way is as suggested by you.
- Any change in resource (allocation and release of container etc) for a given
user could set a state variable. This will set off by the computation thread if
next cycle falls immediate.
- Its not ideal to ask allocation thread to hold till computation. So by seeing
this state variable, we might need to compute user-limit in same allocation
thread.
I was looking in second step to see how much impact it can cause if user-limit
is slightly older. We may over allocate or we may under allocate. I think
under-allocate scenario is fine as we will allocate more from next milli
second. However overallocate scenario may be a worry. Still we have
preemptions/opportunistic ways to handle this.
Ideally we were looking to avoid user-limit computation from same allocation
thread. So after step 1), we can force the user-allocate thread to push for an
immediate computation. Still there could some exceptionally rare case where
user-limit thread is doing computation as per release/allocate demand. But
another allocation thread (heartbeat) may also go in same time frame. If this
is fine, I could update my patch to handle this case.
Thoughts?
> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: Sunil G
> Assignee: Sunil G
> Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch,
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with
> a write lock. To improve performance, this tickets is focussing on moving
> user-limit calculation out of heartbeat allocation flow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]