>>>>> "Bill" == Bill Broadley <b...@cse.ucdavis.edu> writes:

It seems the half-life period of knowledge on the list has decayed to
two weeks on the list :)

I've commented in detail on this (non-)issue on 2014-08-20:

http://www.open-mpi.org/community/lists/users/2014/08/25090.php

A change in the FAQ and a fix in the code would really be nice
at this stage.

Roland

-------
http://www.q-leap.com / http://qlustar.com
          --- HPC / Storage / Cloud Linux Cluster OS ---

    Bill> I've setup several clusters over the years with OpenMPI.  I
    Bill> often get the below error:

    Bill>    WARNING: It appears that your OpenFabrics subsystem is
    Bill>    configured to only allow registering part of your physical
    Bill>    memory.  This can cause MPI jobs to run with erratic
    Bill>    performance, hang, and/or crash.  ...
    Bill>    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

    Bill>      Local host: c2-31 Registerable memory: 32768 MiB Total
    Bill>      memory: 64398 MiB

    Bill> I'm well aware of the normal fixes, and have implemented them
    Bill> in puppet to ensure compute nodes get the changes.  To be
    Bill> paranoid I've implemented all the changes, and they all worked
    Bill> under ubuntu 13.10.

    Bill> However with ubuntu 14.04 it seems like it's not working, thus
    Bill> the above message.

    Bill> As recommended by the faq's I've implemented:
    Bill> 1) ulimit -l unlimited in /etc/profile.d/slurm.sh
    Bill> 2) PropagateResourceLimitsExcept=MEMLOCK in slurm.conf
    Bill> 3) UsePAM=1 in slurm.conf
    Bill> 4) in /etc/security/limits.conf
    Bill>    * hard memlock unlimited
    Bill>    * soft memlock unlimited
    Bill>    * hard stack unlimited
    Bill>    * soft stack unlimited

    Bill> My changes seem to be working, of I submit this to slurm:
    Bill> #!/bin/bash -l
    Bill> ulimit -l hostname mpirun bash -c ulimit -l mpirun ./relay 1
    Bill> 131072

    Bill> I get:
    Bill>    unlimited c2-31 unlimited unlimited unlimited unlimited
    Bill>    <above error message only 32GB of Registerable memory>
    Bill>    <output of mpirun relay>

    Bill> Is there some new kernel parameter, ofed parameter, or similar
    Bill> that controls locked pages now?  The kernel is 3.13.0-36 and
    Bill> the libopenmpi-dev package is 1.6.5.

    Bill> Since the ulimit -l is getting to both the slurm launched
    Bill> script and also to the mpirun launched binaries I'm pretty
    Bill> puzzled.

    Bill> Any suggestions?
    Bill> _______________________________________________ users mailing
    Bill> list us...@open-mpi.org Subscription:
    Bill> http://www.open-mpi.org/mailman/listinfo.cgi/users Link to
    Bill> this post:
    Bill> http://www.open-mpi.org/community/lists/users/2014/10/25544.php

Reply via email to