>>>>> "Bill" == Bill Broadley <b...@cse.ucdavis.edu> writes:
It seems the half-life period of knowledge on the list has decayed to two weeks on the list :) I've commented in detail on this (non-)issue on 2014-08-20: http://www.open-mpi.org/community/lists/users/2014/08/25090.php A change in the FAQ and a fix in the code would really be nice at this stage. Roland ------- http://www.q-leap.com / http://qlustar.com --- HPC / Storage / Cloud Linux Cluster OS --- Bill> I've setup several clusters over the years with OpenMPI. I Bill> often get the below error: Bill> WARNING: It appears that your OpenFabrics subsystem is Bill> configured to only allow registering part of your physical Bill> memory. This can cause MPI jobs to run with erratic Bill> performance, hang, and/or crash. ... Bill> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Bill> Local host: c2-31 Registerable memory: 32768 MiB Total Bill> memory: 64398 MiB Bill> I'm well aware of the normal fixes, and have implemented them Bill> in puppet to ensure compute nodes get the changes. To be Bill> paranoid I've implemented all the changes, and they all worked Bill> under ubuntu 13.10. Bill> However with ubuntu 14.04 it seems like it's not working, thus Bill> the above message. Bill> As recommended by the faq's I've implemented: Bill> 1) ulimit -l unlimited in /etc/profile.d/slurm.sh Bill> 2) PropagateResourceLimitsExcept=MEMLOCK in slurm.conf Bill> 3) UsePAM=1 in slurm.conf Bill> 4) in /etc/security/limits.conf Bill> * hard memlock unlimited Bill> * soft memlock unlimited Bill> * hard stack unlimited Bill> * soft stack unlimited Bill> My changes seem to be working, of I submit this to slurm: Bill> #!/bin/bash -l Bill> ulimit -l hostname mpirun bash -c ulimit -l mpirun ./relay 1 Bill> 131072 Bill> I get: Bill> unlimited c2-31 unlimited unlimited unlimited unlimited Bill> <above error message only 32GB of Registerable memory> Bill> <output of mpirun relay> Bill> Is there some new kernel parameter, ofed parameter, or similar Bill> that controls locked pages now? The kernel is 3.13.0-36 and Bill> the libopenmpi-dev package is 1.6.5. Bill> Since the ulimit -l is getting to both the slurm launched Bill> script and also to the mpirun launched binaries I'm pretty Bill> puzzled. Bill> Any suggestions? Bill> _______________________________________________ users mailing Bill> list us...@open-mpi.org Subscription: Bill> http://www.open-mpi.org/mailman/listinfo.cgi/users Link to Bill> this post: Bill> http://www.open-mpi.org/community/lists/users/2014/10/25544.php