Hi Mike,

In this file, 
$ cat /etc/security/limits.conf
...
< do you see at the end ... >

* hard memlock unlimited
* soft memlock unlimited
# -- All InfiniBand Settings End here --
?

-Tom

> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di
> Domenico
> Sent: Thursday, March 10, 2016 8:55 AM
> To: Open MPI Users
> Subject: [OMPI users] locked memory and queue pairs
> 
> when i try to run an openmpi job with >128 ranks (16 ranks per node)
> using alltoall or alltoallv, i'm getting an error that the process was
> unable to get a queue pair.
> 
> i've checked the max locked memory settings across my machines;
> 
> using ulimit -l in and outside of mpirun and they're all set to unlimited
> pam modules to ensure pam_limits.so is loaded and working
> the /etc/security/limits.conf is set for soft/hard mem to unlimited
> 
> i tried a couple of quick mpi config settings i could think of;
> 
> -mca mtl ^psm no affect
> -mca btl_openib_flags 1 no affect
> 
> the openmpi faq says to tweak some mtt values in /sys, but since i'm
> not on mellanox that doesn't apply to me
> 
> the machines are rhel 6.7, kernel 2.6.32-573.12.1(with bundled ofed),
> running on qlogic single-port infiniband cards, psm is enabled
> 
> other collectives seem to run okay, it seems to only be alltoall comms
> that fail and only at scale
> 
> i believe (but can't prove) that this worked at one point, but i can't
> recall when i last tested it.  so it's reasonable to assume that some
> change to the system is preventing this.
> 
> the question is, where should i start poking to find it?
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-
> mpi.org/community/lists/users/2016/03/28673.php

Reply via email to