Hi Ravi,

we don't run into situation where memory used > RAM, because memory configured to be used by all containers on a node is less than the total amount on memory (by a factor of say 10%). The spikes of container memory usage, that are tolerated due to the averaging don't happen on all containers at once, but are more of a random nature and therefore mostly only single running container "spikes", which therefore doesn't cause any issues. To fully answer your question, we have overcommit enabled and therefore, if we would run out of memory, bad things would happen. :) We are aware of that. The risk of running into OOM-killer can be controlled by the averaging window length - as the length grows, the more and more spikes are tolerated. Setting the averaging window length to 1 switches this feature off, turning it back into the "standard" behavior, which is why I see it as a extension of the current approach, which could be interesting to other people as well.

  Jan

On 10.8.2016 02:48, Ravi Prakash wrote:
Hi Jan!

Thanks for your contribution. In your approach what happens when a few containers on a node are using "excessive" memory (so that total memory used > RAM available on the machine). Do you have overcommit enabled?

Thanks
Ravi

On Tue, Aug 9, 2016 at 1:31 AM, Jan Lukavský <jan.lukav...@firma.seznam.cz <mailto:jan.lukav...@firma.seznam.cz>> wrote:

    Hello community,

    I have a question about container resource calculation in
    nodemanager. Some time ago a filed JIRA
    https://issues.apache.org/jira/browse/YARN-4681
    <https://issues.apache.org/jira/browse/YARN-4681>, which I though
    might address our problems with container being killed because of
    read-only mmaping memory block. The JIRA has not been resolved
    yet, but it turned out for us, that the patch doesn't solve the
    problem. Some applications (namely Apache Spark) tend to allocate
    really large memory blocks outside JVM heap (using mmap, but with
    MAP_PRIVATE), but only for short time periods. We solved this by
    creating a smoothing resource calculator, which averages the
    memory usage of a container over some time period (say 5 minutes).
    This eliminates the problem of container being killed for short
    memory consumption peak, but in the same time preserves the
    ability to kill container that *really* consumes excessive amount
    of memory.

    My question is, does this seem a systematic approach to you and
    should I post our patch to the community or am thinking in a wrong
    direction from the beginning? :)


    Thanks for reactions,

     Jan


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
    <mailto:yarn-dev-unsubscr...@hadoop.apache.org>
    For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
    <mailto:yarn-dev-h...@hadoop.apache.org>



Reply via email to