I apologize in advance for the size of the example source and probably the length of the email, but this has been a pain to track down.
Our application uses System V style shared memory pretty extensively and have recently found that in certain circumstances, OpenMPI appears to provide ranks with stale data. The attached archive contains sample code that demonstrates the issue. There is a subroutine that uses a shared memory array to broadcast from a single rank on one compute node to a single rank on all other compute nodes. The first call sends all 1s, then all 2s, and so on. The receiving rank(s) get all 1s on the first execution, but on subsequent executions they receive some 2s and some 1s; then some 3s, some 2s, and some 1s. The code contains a version of this routine in both C and Fortran but only the Fortran version appears to exhibit the problem. I've tried this with OpenMPI 3.1.5, 4.0.2, and 4.0.4 and on two different systems with very different configurations and both show the problem. On one of the machines, it only appears to happen when MPI is initialized with mpi4py, so I've included that in the test as well. Other than that, the behavior is very consistent across machines. When run with the same number of ranks and same size array, the two machines even show the invalid values at the same indices. Please let me know if you need any additional information. Thanks, Patrick
shmemTest.tgz
Description: application/compressed-tar