Hi Ravi,
we don't run into situation where memory used > RAM, because memory
configured to be used by all containers on a node is less than the total
amount on memory (by a factor of say 10%). The spikes of container
memory usage, that are tolerated due to the averaging don't happen on
all containers at once, but are more of a random nature and therefore
mostly only single running container "spikes", which therefore doesn't
cause any issues. To fully answer your question, we have overcommit
enabled and therefore, if we would run out of memory, bad things would
happen. :) We are aware of that. The risk of running into OOM-killer can
be controlled by the averaging window length - as the length grows, the
more and more spikes are tolerated. Setting the averaging window length
to 1 switches this feature off, turning it back into the "standard"
behavior, which is why I see it as a extension of the current approach,
which could be interesting to other people as well.
Jan
On 10.8.2016 02:48, Ravi Prakash wrote:
Hi Jan!
Thanks for your contribution. In your approach what happens when a few
containers on a node are using "excessive" memory (so that total
memory used > RAM available on the machine). Do you have overcommit
enabled?
Thanks
Ravi
On Tue, Aug 9, 2016 at 1:31 AM, Jan Lukavský
<jan.lukav...@firma.seznam.cz <mailto:jan.lukav...@firma.seznam.cz>>
wrote:
Hello community,
I have a question about container resource calculation in
nodemanager. Some time ago a filed JIRA
https://issues.apache.org/jira/browse/YARN-4681
<https://issues.apache.org/jira/browse/YARN-4681>, which I though
might address our problems with container being killed because of
read-only mmaping memory block. The JIRA has not been resolved
yet, but it turned out for us, that the patch doesn't solve the
problem. Some applications (namely Apache Spark) tend to allocate
really large memory blocks outside JVM heap (using mmap, but with
MAP_PRIVATE), but only for short time periods. We solved this by
creating a smoothing resource calculator, which averages the
memory usage of a container over some time period (say 5 minutes).
This eliminates the problem of container being killed for short
memory consumption peak, but in the same time preserves the
ability to kill container that *really* consumes excessive amount
of memory.
My question is, does this seem a systematic approach to you and
should I post our patch to the community or am thinking in a wrong
direction from the beginning? :)
Thanks for reactions,
Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
<mailto:yarn-dev-unsubscr...@hadoop.apache.org>
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
<mailto:yarn-dev-h...@hadoop.apache.org>