Hi Mike, In this file, $ cat /etc/security/limits.conf ... < do you see at the end ... >
* hard memlock unlimited * soft memlock unlimited # -- All InfiniBand Settings End here -- ? -Tom > -----Original Message----- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di > Domenico > Sent: Thursday, March 10, 2016 8:55 AM > To: Open MPI Users > Subject: [OMPI users] locked memory and queue pairs > > when i try to run an openmpi job with >128 ranks (16 ranks per node) > using alltoall or alltoallv, i'm getting an error that the process was > unable to get a queue pair. > > i've checked the max locked memory settings across my machines; > > using ulimit -l in and outside of mpirun and they're all set to unlimited > pam modules to ensure pam_limits.so is loaded and working > the /etc/security/limits.conf is set for soft/hard mem to unlimited > > i tried a couple of quick mpi config settings i could think of; > > -mca mtl ^psm no affect > -mca btl_openib_flags 1 no affect > > the openmpi faq says to tweak some mtt values in /sys, but since i'm > not on mellanox that doesn't apply to me > > the machines are rhel 6.7, kernel 2.6.32-573.12.1(with bundled ofed), > running on qlogic single-port infiniband cards, psm is enabled > > other collectives seem to run okay, it seems to only be alltoall comms > that fail and only at scale > > i believe (but can't prove) that this worked at one point, but i can't > recall when i last tested it. so it's reasonable to assume that some > change to the system is preventing this. > > the question is, where should i start poking to find it? > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open- > mpi.org/community/lists/users/2016/03/28673.php