Hi Waleed

As Devendar said (and I tried to explain before),
you need to allow the locked memory limit to be unlimited for
user processes (in /etc/security/limits.conf),
*AND* somehow the daemon/job_script/whatever that launches the mpiexec
command must request "ulimit -l unlimited" (directly or indirectly).
The latter part depends on how your system's details.
I am not familiar to SGE (I use Torque), but presumably you can
add "ulimit -l unlimited" when you launch
the SGE daemons on the nodes.
Presumably this will make the processes launched by that daemon
(i.e. your mpiexec) inherit those limits,
and that is how I do it on Torque.
A more brute force way is just to include "ulimit -l unlimited"
in you job script before mpiexec.
Inserting a "ulimit -a" in your jobscript may help diagnose what you
actually have.
Please, see the OMPI FAQ that I sent you before for more details.

I hope this helps,
Gus Correa

On 01/06/2015 01:37 PM, Deva wrote:
Hi Waleed,

----------
   Memlock limit: 65536
----------

such a low limit should be due to per-user lock memory limit . Can you
make sure it is  set to "unlimited" on all nodes ( "ulimit -l unlimited")?

-Devendar

On Tue, Jan 6, 2015 at 3:42 AM, Waleed Lotfy <waleed.lo...@bibalex.org
<mailto:waleed.lo...@bibalex.org>> wrote:

    Hi guys,

    Sorry for getting back so late, but we ran into some problems during
    the installation process and as soon as the system came up I tested
    the new versions for the problem but it showed another memory
    related warning.

    --------------------------------------------------------------------------
    The OpenFabrics (openib) BTL failed to initialize while trying to
    allocate some locked memory.  This typically can indicate that the
    memlock limits are set too low.  For most HPC installations, the
    memlock limits should be set to "unlimited".  The failure occured
    here:

       Local host:    comp003.local
       OMPI source:   btl_openib_component.c:1200
       Function:      ompi_free_list_init_ex_new()
       Device:        mlx4_0
       Memlock limit: 65536

    You may need to consult with your system administrator to get this
    problem fixed.  This FAQ entry on the Open MPI web site may also be
    helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    WARNING: There was an error initializing an OpenFabrics device.

       Local host:   comp003.local
       Local device: mlx4_0
    --------------------------------------------------------------------------

    <<<Then the output of the program follows.>>>

    My current running versions:

    OpenMPI: 1.6.4
    OFED-internal-2.3-2

    I checked /etc/security/limits.d/, the scheduler's configurations
    (grid engine) and tried adding the following line to
    /etc/modprobe.d/mlx4_core: 'options mlx4_core log_num_mtt=22
    log_mtts_per_seg=1' as suggested by Gus.

    I am running out of ideas here, so please any help is appreciated.

    P.S. I am not sure if I should open a new thread with this issue or
    continue with the current one, so please advice.

    Waleed Lotfy
    Bibliotheca Alexandrina
    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2015/01/26107.php




--


-Devendar


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/01/26109.php


Reply via email to