Hi Jan! Yes! Makes sense. I'm sure there were bigger changes for the ResourceHandler. Which version are you on?
Cheers Ravi On Thu, Aug 11, 2016 at 7:48 AM, Jan Lukavský <jan.lukav...@firma.seznam.cz> wrote: > Hi Ravi, > > I don't think cgroups will help us, because, we don't want to impose a > hard limit on the memory usage, we just want to allow for short time > periods, when container can consume more memory than its limit. We don't > want to put the limit too high, because that causes underutilization of our > cluster, but setting it "reasonable" causes applications to fail (because > of random containers being killed because of spikes). That's why we created > the time-window averaging resource calculator, and I was trying to find > out, if anybody else is having similar kind of issues. If so, I could > contribute our extension (and therefore we will not have to maintain it > ourselves in a separate repository :)). The resource calculator is for > hadoop 2.6, and I suppose there might be larger changes around this in > higher versions? > > Cheers, > Jan > > On 10.8.2016 19:23, Ravi Prakash wrote: > > Hi Jan! > > Thanks for your explanation. I'm glad that works for you! :-) > https://issues.apache.org/jira/browse/YARN-5202 is something that Yahoo! > talked about at the Hadoop Summit, (and it seems the community may be going > in a similar direction, although not exactly the same.) There's also > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn- > project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn- > server-nodemanager/src/main/java/org/apache/hadoop/yarn/ > server/nodemanager/containermanager/linux/resources/CGroupsHandler.java . > Ideally at my company we'd like memory limits also to be imposed by Cgroups > because we have had the OOM-killer wreak havoc a couple of times, but from > what I know, that is not an option yet. > > Cheers > Ravi > > On Wed, Aug 10, 2016 at 1:54 AM, Jan Lukavský < > jan.lukav...@firma.seznam.cz> wrote: > >> Hi Ravi, >> >> we don't run into situation where memory used > RAM, because memory >> configured to be used by all containers on a node is less than the total >> amount on memory (by a factor of say 10%). The spikes of container memory >> usage, that are tolerated due to the averaging don't happen on all >> containers at once, but are more of a random nature and therefore mostly >> only single running container "spikes", which therefore doesn't cause any >> issues. To fully answer your question, we have overcommit enabled and >> therefore, if we would run out of memory, bad things would happen. :) We >> are aware of that. The risk of running into OOM-killer can be controlled by >> the averaging window length - as the length grows, the more and more spikes >> are tolerated. Setting the averaging window length to 1 switches this >> feature off, turning it back into the "standard" behavior, which is why I >> see it as a extension of the current approach, which could be interesting >> to other people as well. >> >> Jan >> >> >> On 10.8.2016 02:48, Ravi Prakash wrote: >> >> Hi Jan! >> >> Thanks for your contribution. In your approach what happens when a few >> containers on a node are using "excessive" memory (so that total memory >> used > RAM available on the machine). Do you have overcommit enabled? >> >> Thanks >> Ravi >> >> On Tue, Aug 9, 2016 at 1:31 AM, Jan Lukavský < >> jan.lukav...@firma.seznam.cz> wrote: >> >>> Hello community, >>> >>> I have a question about container resource calculation in nodemanager. >>> Some time ago a filed JIRA https://issues.apache.org/jira >>> /browse/YARN-4681, which I though might address our problems with >>> container being killed because of read-only mmaping memory block. The JIRA >>> has not been resolved yet, but it turned out for us, that the patch doesn't >>> solve the problem. Some applications (namely Apache Spark) tend to allocate >>> really large memory blocks outside JVM heap (using mmap, but with >>> MAP_PRIVATE), but only for short time periods. We solved this by creating a >>> smoothing resource calculator, which averages the memory usage of a >>> container over some time period (say 5 minutes). This eliminates the >>> problem of container being killed for short memory consumption peak, but in >>> the same time preserves the ability to kill container that *really* >>> consumes excessive amount of memory. >>> >>> My question is, does this seem a systematic approach to you and should I >>> post our patch to the community or am thinking in a wrong direction from >>> the beginning? :) >>> >>> >>> Thanks for reactions, >>> >>> Jan >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org >>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org >>> >>> >> >> > > > -- > > Jan Lukavský > Vedoucí týmu vývoje > Seznam.cz, a.s. > Radlická 3294/10 > 15000, Praha 5 > jan.lukavsky@firma.seznam.czhttp://www.seznam.cz > >