On Wed, 11 Apr 2012 at 10:46am, orlando.richa...@ed.ac.uk wrote

We ran into a problem with infiniband based MPI jobs caused by a change in the default max locked memory ulimit which init-spawned processes start with, between RHEL5 and RHEL6.

If you run a job through the old and new environments which just does "ulimit -a", do you see a difference? Particularly - do you see a difference in the max locked memory (ulimit -l)?

I don't have any C5 hosts left in my real cluster, but I was able to dredge up my old VirtualBox test cluster and run ulimit on both. The differences are (sorry for any bad table formatting):

resource          CentOS-5 value    CentOS-6 value
pending signals             6143             30507
max locked memory             32                64
max user processes          6143             30507

So all the resource limits *increased* going from C5 to C6.

Our fix for this was to put a "ulimit -l unlimited" in our sgeexecd init script, immediately before the sge_execd startup command. In our case, "unlimited" is the required value as per the QLogic infiniband setup process.

I tried this anyway and still saw the same failures. Thanks for having a look, though.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to