when i try to run an openmpi job with >128 ranks (16 ranks per node) using alltoall or alltoallv, i'm getting an error that the process was unable to get a queue pair.
i've checked the max locked memory settings across my machines; using ulimit -l in and outside of mpirun and they're all set to unlimited pam modules to ensure pam_limits.so is loaded and working the /etc/security/limits.conf is set for soft/hard mem to unlimited i tried a couple of quick mpi config settings i could think of; -mca mtl ^psm no affect -mca btl_openib_flags 1 no affect the openmpi faq says to tweak some mtt values in /sys, but since i'm not on mellanox that doesn't apply to me the machines are rhel 6.7, kernel 2.6.32-573.12.1(with bundled ofed), running on qlogic single-port infiniband cards, psm is enabled other collectives seem to run okay, it seems to only be alltoall comms that fail and only at scale i believe (but can't prove) that this worked at one point, but i can't recall when i last tested it. so it's reasonable to assume that some change to the system is preventing this. the question is, where should i start poking to find it?