Hello Ray, A few questions to help better understand the problem.
- Are you running the benchmark on a single node? - If the answer to that question is yes, could you try using UCX for messaging - mpirun --mca pml ucx and see if you still observe the hang? Also, it would help to open an issue for this problem at https://github.com/open-mpi/ompi/issues Thanks, Howard On 1/14/26, 9:46 AM, "[email protected] <mailto:[email protected]> on behalf of Sheppard, Raymond W" <[email protected] <mailto:[email protected]> on behalf of [email protected] <mailto:[email protected]>> wrote: Hi, Ray Sheppard here wearing my SPEC hat. We received a mail from AMD we are not sure how to deal with. So I thought I would pass it along in case anyone might have some relevant thoughts about it. It looks like Jeff S. filed the issue they site. We are sort of fishing for a response to them. So any info is appreciated. Thanks. Ray Dear Support. I am an engineer at AMD who is currently running the SPECMPI2007 benchmarks, and we are experiencing issues with running the 122.Tachyon benchmark when compiled with OpenMPI 5. It is our goal to be able to run SPECMPI with OpenMPI 5 to minimize the overhead of MPI in our benchmarking. In our usual configuration, running the benchmark on 256 ranks using OpenMPI 5 with the cross-memory attach (CMA) fabric. It appears that the 122.Tachyon benchmark deadlocks. When running Tachyon with OpenMPI 4.1.8 and the UCX fabric, this issue does not occur. On investigating further, we observe: With MPICH v4.3.0 the benchmark fails to run due to an MPI error detected by MPICH, due to an ‘MPI_Allgather()’ call using the same array for the send and receive buffer, which is disallowed by the MPI spec. On modifying the benchmark to correct the issue with the Allgather call we see the following: MPICH runs to completion, then crashes at finalization. OpenMPI still deadlocks. The deadlock is only observed when running on >35 ranks and is present in multiple versions of OpenMPI (v.5.0.5, v.5.0.8). We discovered this issue for OpenMPI when investigating this: https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$ <https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$> , which may be relevant. Is this a known issue with 122.Tachyon benchmark, and are you able to help us run 122.Tachyon on OpenMPI 5? Thank you in advance for your help. If you require any further information, please do not hesitate to reach out to me. Thanks James To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
