Hi Jan!

Yes! Makes sense. I'm sure there were bigger changes for the
ResourceHandler. Which version are you on?

Cheers
Ravi

On Thu, Aug 11, 2016 at 7:48 AM, Jan Lukavský <jan.lukav...@firma.seznam.cz>
wrote:

> Hi Ravi,
>
> I don't think cgroups will help us, because, we don't want to impose a
> hard limit on the memory usage, we just want to allow for short time
> periods, when container can consume more memory than its limit. We don't
> want to put the limit too high, because that causes underutilization of our
> cluster, but setting it "reasonable" causes applications to fail (because
> of random containers being killed because of spikes). That's why we created
> the time-window averaging resource calculator, and I was trying to find
> out, if anybody else is having similar kind of issues. If so, I could
> contribute our extension (and therefore we will not have to maintain it
> ourselves in a separate repository :)). The resource calculator is for
> hadoop 2.6, and I suppose there might be larger changes around this in
> higher versions?
>
> Cheers,
>  Jan
>
> On 10.8.2016 19:23, Ravi Prakash wrote:
>
> Hi Jan!
>
> Thanks for your explanation. I'm glad that works for you! :-)
> https://issues.apache.org/jira/browse/YARN-5202 is something that Yahoo!
> talked about at the Hadoop Summit, (and it seems the community may be going
> in a similar direction, although not exactly the same.) There's also
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-
> project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-
> server-nodemanager/src/main/java/org/apache/hadoop/yarn/
> server/nodemanager/containermanager/linux/resources/CGroupsHandler.java .
> Ideally at my company we'd like memory limits also to be imposed by Cgroups
> because we have had the OOM-killer wreak havoc a couple of times, but from
> what I know, that is not an option yet.
>
> Cheers
> Ravi
>
> On Wed, Aug 10, 2016 at 1:54 AM, Jan Lukavský <
> jan.lukav...@firma.seznam.cz> wrote:
>
>> Hi Ravi,
>>
>> we don't run into situation where memory used > RAM, because memory
>> configured to be used by all containers on a node is less than the total
>> amount on memory (by a factor of say 10%). The spikes of container memory
>> usage, that are tolerated due to the averaging don't happen on all
>> containers at once, but are more of a random nature and therefore mostly
>> only single running container "spikes", which therefore doesn't cause any
>> issues. To fully answer your question, we have overcommit enabled and
>> therefore, if we would run out of memory, bad things would happen. :) We
>> are aware of that. The risk of running into OOM-killer can be controlled by
>> the averaging window length - as the length grows, the more and more spikes
>> are tolerated. Setting the averaging window length to 1 switches this
>> feature off, turning it back into the "standard" behavior, which is why I
>> see it as a extension of the current approach, which could be interesting
>> to other people as well.
>>
>>   Jan
>>
>>
>> On 10.8.2016 02:48, Ravi Prakash wrote:
>>
>> Hi Jan!
>>
>> Thanks for your contribution. In your approach what happens when a few
>> containers on a node are using "excessive" memory (so that total memory
>> used > RAM available on the machine). Do you have overcommit enabled?
>>
>> Thanks
>> Ravi
>>
>> On Tue, Aug 9, 2016 at 1:31 AM, Jan Lukavský <
>> jan.lukav...@firma.seznam.cz> wrote:
>>
>>> Hello community,
>>>
>>> I have a question about container resource calculation in nodemanager.
>>> Some time ago a filed JIRA https://issues.apache.org/jira
>>> /browse/YARN-4681, which I though might address our problems with
>>> container being killed because of read-only mmaping memory block. The JIRA
>>> has not been resolved yet, but it turned out for us, that the patch doesn't
>>> solve the problem. Some applications (namely Apache Spark) tend to allocate
>>> really large memory blocks outside JVM heap (using mmap, but with
>>> MAP_PRIVATE), but only for short time periods. We solved this by creating a
>>> smoothing resource calculator, which averages the memory usage of a
>>> container over some time period (say 5 minutes). This eliminates the
>>> problem of container being killed for short memory consumption peak, but in
>>> the same time preserves the ability to kill container that *really*
>>> consumes excessive amount of memory.
>>>
>>> My question is, does this seem a systematic approach to you and should I
>>> post our patch to the community or am thinking in a wrong direction from
>>> the beginning? :)
>>>
>>>
>>> Thanks for reactions,
>>>
>>>  Jan
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>>>
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> Vedoucí týmu vývoje
> Seznam.cz, a.s.
> Radlická 3294/10
> 15000, Praha 5
> jan.lukavsky@firma.seznam.czhttp://www.seznam.cz
>
>

Reply via email to