Hi Waleed As Devendar said (and I tried to explain before), you need to allow the locked memory limit to be unlimited for user processes (in /etc/security/limits.conf), *AND* somehow the daemon/job_script/whatever that launches the mpiexec command must request "ulimit -l unlimited" (directly or indirectly). The latter part depends on how your system's details. I am not familiar to SGE (I use Torque), but presumably you can add "ulimit -l unlimited" when you launch the SGE daemons on the nodes. Presumably this will make the processes launched by that daemon (i.e. your mpiexec) inherit those limits, and that is how I do it on Torque. A more brute force way is just to include "ulimit -l unlimited" in you job script before mpiexec. Inserting a "ulimit -a" in your jobscript may help diagnose what you actually have. Please, see the OMPI FAQ that I sent you before for more details.
I hope this helps, Gus Correa On 01/06/2015 01:37 PM, Deva wrote:
Hi Waleed, ---------- Memlock limit: 65536 ---------- such a low limit should be due to per-user lock memory limit . Can you make sure it is set to "unlimited" on all nodes ( "ulimit -l unlimited")? -Devendar On Tue, Jan 6, 2015 at 3:42 AM, Waleed Lotfy <waleed.lo...@bibalex.org <mailto:waleed.lo...@bibalex.org>> wrote: Hi guys, Sorry for getting back so late, but we ran into some problems during the installation process and as soon as the system came up I tested the new versions for the problem but it showed another memory related warning. -------------------------------------------------------------------------- The OpenFabrics (openib) BTL failed to initialize while trying to allocate some locked memory. This typically can indicate that the memlock limits are set too low. For most HPC installations, the memlock limits should be set to "unlimited". The failure occured here: Local host: comp003.local OMPI source: btl_openib_component.c:1200 Function: ompi_free_list_init_ex_new() Device: mlx4_0 Memlock limit: 65536 You may need to consult with your system administrator to get this problem fixed. This FAQ entry on the Open MPI web site may also be helpful: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: comp003.local Local device: mlx4_0 -------------------------------------------------------------------------- <<<Then the output of the program follows.>>> My current running versions: OpenMPI: 1.6.4 OFED-internal-2.3-2 I checked /etc/security/limits.d/, the scheduler's configurations (grid engine) and tried adding the following line to /etc/modprobe.d/mlx4_core: 'options mlx4_core log_num_mtt=22 log_mtts_per_seg=1' as suggested by Gus. I am running out of ideas here, so please any help is appreciated. P.S. I am not sure if I should open a new thread with this issue or continue with the current one, so please advice. Waleed Lotfy Bibliotheca Alexandrina _______________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/01/26107.php -- -Devendar _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/01/26109.php