On Wed, 11 Apr 2012 at 10:46am, orlando.richa...@ed.ac.uk wrote
We ran into a problem with infiniband based MPI jobs caused by a change in
the default max locked memory ulimit which init-spawned processes start with,
between RHEL5 and RHEL6.
If you run a job through the old and new environments which just does "ulimit
-a", do you see a difference? Particularly - do you see a difference in the
max locked memory (ulimit -l)?
I don't have any C5 hosts left in my real cluster, but I was able to
dredge up my old VirtualBox test cluster and run ulimit on both. The
differences are (sorry for any bad table formatting):
resource CentOS-5 value CentOS-6 value
pending signals 6143 30507
max locked memory 32 64
max user processes 6143 30507
So all the resource limits *increased* going from C5 to C6.
Our fix for this was to put a "ulimit -l unlimited" in our sgeexecd init
script, immediately before the sge_execd startup command. In our case,
"unlimited" is the required value as per the QLogic infiniband setup process.
I tried this anyway and still saw the same failures. Thanks for having a
look, though.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users