Re: [OMPI users] segfault in libibverbs.so

2020-07-27 Thread Gilles Gouaillardet via users
Prentice, ibverbs might be used by UCX (either pml/ucx or btl/uct), so to be 100% sure, you should mpirun --mca pml ob1 --mca btl ^openib,uct ... in order to force btl/tcp, you need to ensure pml/ob1 is used, and then you always need the btl/self component mpirun --mca pml ob1 --mca btl tcp,se

Re: [OMPI users] segfault in libibverbs.so

2020-07-27 Thread Reuti via users
Am 27.07.2020 um 21:18 schrieb Prentice Bisbal via users: > Can anyone explain why my job still calls libibverbs when I run it with '-mca > btl ^openib'? A similar behavior I observed too in a mixed cluster where some nodes have InfiniBand and others not. Even checking the node beforehand and

Re: [OMPI users] segfault in libibverbs.so

2020-07-27 Thread Prentice Bisbal via users
Can anyone explain why my job still calls libibverbs when I run it with '-mca btl ^openib'? If I instead use '-mca btl tcp', my jobs don't segfault. I would assum 'mca btl ^openib' and '-mca btl tcp' to essentially be equivalent, but there's obviously a difference in the two. Prentice On 7/

[OMPI users] segfault in libibverbs.so

2020-07-23 Thread Prentice Bisbal via users
I manage a cluster that is very heterogeneous. Some nodes have InfiniBand, while others have 10 Gb/s Ethernet. We recently upgraded to CentOS 7, and built a new software stack for CentOS 7. We are using OpenMPI 4.0.3, and we are using Slurm 19.05.5 as our job scheduler. We just noticed that wh