Jake Carroll <jake.carr...@uq.edu.au> writes:

> We figured it out!
>
>
> Specific user binary was not respecting vf memory complex and decided to
> use all the RAM on random nodes it landed on!

There's nothing to respect.  If you use vf, it's only relevant to
scheduling jobs, not memory usage while running.  If you want to
restrict usage (as you typically should), make h_vmem consumable with
appropriate values on the hosts, and use that.  See various references
in the archives.

> How this generated a 137, and the explanation for what we were told a 137
> meant really threw us off however!

How come?  There was a reference to the documentation with an explicit
answer to the question.  (Advice on examining messages and syslog is in
http://arc.liv.ac.uk/SGE/howto/troubleshooting.html.)

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to