Am 29.08.2012 um 17:02 schrieb Ben De Luca: > I was wondering, how people deal with oom conditions on there cluster. > We constantly have machines that die because the oom killer takes out > critical system services. > > Has any experiance with the oom_adj proc value, or a patch to grid to > support it?
Do the users request memory by virtual_free or h_vmem? Touching oom looks like fixing the symptoms and not the cause. -- Reuti > > /proc/[pid]/oom_adj (since Linux 2.6.11) > This file can be used to adjust the score used to select > which process > should be killed in an out-of-memory (OOM) situation. > The kernel uses > this value for a bit-shift operation of the process's > oom_score value: > valid values are in the range -16 to +15, plus the > special value -17, > which disables OOM-killing altogether for this process. > A positive > score increases the likelihood of this process being > killed by the OOM- > killer; a negative score decreases the likelihood. The > default value > for this file is 0; a new process inherits its parent's oom_adj > setting. A process must be privileged > (CAP_SYS_RESOURCE) to update > this file. > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
