Jan, As part of YARN-1011 (oversubscription work), we are looking at better (faster) ways of monitoring and enforcement and considering putting all YARN containers under a cgroup with hard limit so YARN as a whole does not go over a limit, but let the individual containers run over. The details are not clear yet, but hopefully that will help you.
On Tue, Aug 16, 2016 at 12:53 AM, Jan Lukavský <jan.lukav...@firma.seznam.cz > wrote: > Hi Ravi, > > sorry for late answer. :) We are on hadoop 2.6-cdh5.7. > > Cheers, > Jan > > On 12.8.2016 01:57, Ravi Prakash wrote: > >> Hi Jan! >> >> Yes! Makes sense. I'm sure there were bigger changes for the >> ResourceHandler. Which version are you on? >> >> Cheers >> Ravi >> >> On Thu, Aug 11, 2016 at 7:48 AM, Jan Lukavský < >> jan.lukav...@firma.seznam.cz <mailto:jan.lukav...@firma.seznam.cz>> >> wrote: >> >> Hi Ravi, >> >> I don't think cgroups will help us, because, we don't want to >> impose a hard limit on the memory usage, we just want to allow for >> short time periods, when container can consume more memory than >> its limit. We don't want to put the limit too high, because that >> causes underutilization of our cluster, but setting it >> "reasonable" causes applications to fail (because of random >> containers being killed because of spikes). That's why we created >> the time-window averaging resource calculator, and I was trying to >> find out, if anybody else is having similar kind of issues. If so, >> I could contribute our extension (and therefore we will not have >> to maintain it ourselves in a separate repository :)). The >> resource calculator is for hadoop 2.6, and I suppose there might >> be larger changes around this in higher versions? >> >> Cheers, >> Jan >> >> On 10.8.2016 19:23, Ravi Prakash wrote: >> >>> Hi Jan! >>> >>> Thanks for your explanation. I'm glad that works for you! :-) >>> https://issues.apache.org/jira/browse/YARN-5202 >>> <https://issues.apache.org/jira/browse/YARN-5202> is something >>> that Yahoo! talked about at the Hadoop Summit, (and it seems the >>> community may be going in a similar direction, although not >>> exactly the same.) There's also >>> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-proj >>> ect/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server- >>> nodemanager/src/main/java/org/apache/hadoop/yarn/server/ >>> nodemanager/containermanager/linux/resources/CGroupsHandler.java >>> <https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-pro >>> ject/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server- >>> nodemanager/src/main/java/org/apache/hadoop/yarn/server/ >>> nodemanager/containermanager/linux/resources/CGroupsHandler.java> >>> . Ideally at my company we'd like memory limits also to be >>> imposed by Cgroups because we have had the OOM-killer wreak havoc >>> a couple of times, but from what I know, that is not an option yet. >>> >>> Cheers >>> Ravi >>> >>> On Wed, Aug 10, 2016 at 1:54 AM, Jan Lukavský >>> <jan.lukav...@firma.seznam.cz >>> <mailto:jan.lukav...@firma.seznam.cz>> wrote: >>> >>> Hi Ravi, >>> >>> we don't run into situation where memory used > RAM, because >>> memory configured to be used by all containers on a node is >>> less than the total amount on memory (by a factor of say >>> 10%). The spikes of container memory usage, that are >>> tolerated due to the averaging don't happen on all containers >>> at once, but are more of a random nature and therefore mostly >>> only single running container "spikes", which therefore >>> doesn't cause any issues. To fully answer your question, we >>> have overcommit enabled and therefore, if we would run out of >>> memory, bad things would happen. :) We are aware of that. The >>> risk of running into OOM-killer can be controlled by the >>> averaging window length - as the length grows, the more and >>> more spikes are tolerated. Setting the averaging window >>> length to 1 switches this feature off, turning it back into >>> the "standard" behavior, which is why I see it as a extension >>> of the current approach, which could be interesting to other >>> people as well. >>> >>> Jan >>> >>> >>> On 10.8.2016 02:48, Ravi Prakash wrote: >>> >>>> Hi Jan! >>>> >>>> Thanks for your contribution. In your approach what happens >>>> when a few containers on a node are using "excessive" memory >>>> (so that total memory used > RAM available on the machine). >>>> Do you have overcommit enabled? >>>> >>>> Thanks >>>> Ravi >>>> >>>> On Tue, Aug 9, 2016 at 1:31 AM, Jan Lukavský >>>> <jan.lukav...@firma.seznam.cz >>>> <mailto:jan.lukav...@firma.seznam.cz>> wrote: >>>> >>>> Hello community, >>>> >>>> I have a question about container resource calculation >>>> in nodemanager. Some time ago a filed JIRA >>>> https://issues.apache.org/jira/browse/YARN-4681 >>>> <https://issues.apache.org/jira/browse/YARN-4681>, which >>>> I though might address our problems with container being >>>> killed because of read-only mmaping memory block. The >>>> JIRA has not been resolved yet, but it turned out for >>>> us, that the patch doesn't solve the problem. Some >>>> applications (namely Apache Spark) tend to allocate >>>> really large memory blocks outside JVM heap (using mmap, >>>> but with MAP_PRIVATE), but only for short time periods. >>>> We solved this by creating a smoothing resource >>>> calculator, which averages the memory usage of a >>>> container over some time period (say 5 minutes). This >>>> eliminates the problem of container being killed for >>>> short memory consumption peak, but in the same time >>>> preserves the ability to kill container that *really* >>>> consumes excessive amount of memory. >>>> >>>> My question is, does this seem a systematic approach to >>>> you and should I post our patch to the community or am >>>> thinking in a wrong direction from the beginning? :) >>>> >>>> >>>> Thanks for reactions, >>>> >>>> Jan >>>> >>>> >>>> ------------------------------ >>>> --------------------------------------- >>>> To unsubscribe, e-mail: >>>> yarn-dev-unsubscr...@hadoop.apache.org >>>> <mailto:yarn-dev-unsubscr...@hadoop.apache.org> >>>> For additional commands, e-mail: >>>> yarn-dev-h...@hadoop.apache.org >>>> <mailto:yarn-dev-h...@hadoop.apache.org> >>>> >>>> >>>> >>> >>> >> >> -- >> Jan Lukavský >> Vedoucí týmu vývoje >> Seznam.cz, a.s. >> Radlická 3294/10 >> 15000, Praha 5 >> >> jan.lukav...@firma.seznam.cz <mailto:jan.lukav...@firma.seznam.cz> >> http://www.seznam.cz >> >> >> > > -- > > Jan Lukavský > Vedoucí týmu vývoje > Seznam.cz, a.s. > Radlická 3294/10 > 15000, Praha 5 > > jan.lukav...@firma.seznam.cz > http://www.seznam.cz > >