[ https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140515#comment-15140515 ]
Jan Lukavsky commented on YARN-4681: ------------------------------------ [~cnauroth], I tested this patch against our jobs and it kind of helps, but doesn't solve the whole problem. Another problem is that we see spikes of direct memory allocations (so far I didn't track where exactly they come from), but it lead me to a thought, that it might help not to calculate the exact memory consumption of a container, but to average it over some time period (configurable, default zero, which would lead to the current behavior). So, first I will modify the patch as you suggest (so that if the Locked field is missing, then the behavior of the ProcfsBasedProcessTree will be exactly the same as before). I will then try to add the time averaging and let you know if it helped. Regarding the more aggresive strategies, I made some experiments and I don't think it would help. > ProcfsBasedProcessTree should not calculate private clean pages > --------------------------------------------------------------- > > Key: YARN-4681 > URL: https://issues.apache.org/jira/browse/YARN-4681 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.6.0, 2.7.0 > Reporter: Jan Lukavsky > Attachments: YARN-4681.patch > > > ProcfsBasedProcessTree in Node manager calculates memory used by a process > tree by parsing {{/etc/<pid>/smaps}}, where it calculates {{min(Pss, > Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} > private clean pages can be reclaimed by kernel, this should be changed to > calculating only {{Locked}} pages instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)