Hi, The application compiled with OpenMPI-5.0.2 or 5.0.3 runs fine *only if "mpirun -mca pml ob1" *option is used. If any other options such as "-mca pml ucx" OR some other btl options OR if none of the options are used, then it fails with following error:
[n1:00000] *** An error occurred in MPI_Isend [n1:00000] *** reported by process [2874540033,16] [n1:00000] *** on communicator MPI_COMM_WORLD [n1:00000] *** MPI_ERR_TAG: invalid tag [n1:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [n1:00000] *** and MPI will try to terminate your MPI job as well) The job is running on only 1 node. OpenMPI-5.0.2 compiled with UCX-1.17.0 and following configure options: --with-knem=/opt/knem-1.1.4.90mlnx2 \ --with-ofi=/opt/libfabric/1.13.1 \ --with-ucx=/openmpi/ucx/1.17.0/g131xpmt \ --with-pbs=/opt/pbs \ --with-threads=pthreads \ --without-lsf --without-cuda \ --with-libevent=/openmpi/libevent/2.1.12 --with-libevent-libdir=/openmpi/libevent/2.1.12/lib \ --with-hwloc=/openmpi/hwloc/2.11.0/g131 --with-hwloc-libdir=/openmpi/hwloc/2.11.0/lib \ --with-pmix=/openmpi/pmix/502/g131 --with-pmix-libdir=/openmpi/pmix/502/lib \ --enable-shared --enable-static --enable-mt \ --enable-mca-no-build=btl-usnic Each node has a 4X HDR card installed: CA 'mlx5_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.33.1048 Hardware version: 0 Node GUID: 0x88e9a4ffff6f0680 System image GUID: 0x88e9a4ffff6f0680 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 37 LMC: 0 SM lid: 1 Capability mask: 0xa651e848 Port GUID: 0x88e9a4ffff6f0680 Link layer: InfiniBand Can anybody help me to know why it works for only "-mca pml ob1" and why not for other options?