Hi Ernesto,

You program is erroneous from MPI standard perspective. That means you are in 
anything can happen land. MPI implementations are typically optimized for 
performance and assume correct MPI usage from the application.
In your situation, especially with small message size (?), the broadcast from 
root will result in an eager send. The broadcast is not defined as 
synchronizing collective (other than barrier, for example).
The message from your broadcast might actually match to any collective later in 
the code leading to more obscure error pattern.

To pinpoint such correctness errors in your application code, you can use tools 
like MUST (https://itc.rwth-aachen.de/must/), which will point out inconsistent 
(and therefore erroneous) use of collective communication.

- Joachim
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Ernesto Prudencio 
via users <users@lists.open-mpi.org>
Sent: Saturday, April 2, 2022 2:29:07 AM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Ernesto Prudencio <epruden...@slb.com>
Subject: [OMPI users] 101 question on MPI_Bcast()


I have an “extreme” case below, for the sake of example.



Suppose one is running a MPI job with N >= 2 ranks, and at a certain moment the 
code does the following:



.

.

.

If (rank == 0) {

    MPI_Bcast(…);

}

.

.

.

std::cout << “Here A, rank = “ << rank << std::endl;

MPI_Barrier(…);

std::cout << “Here B, rank = “ << rank << std::endl;

.

.

.



I thought rank 0 would never print the message “Here A”, because he MPI lib at 
rank 0 would be stuck on the MPI_Bcast waiting for all other ranks to notify 
(internally, in the MPI lib logic) that they have received the contents.



But this seems not to be the case. Instead, the code behaves as follows:

  1.  MPI_Bcast() returns the processing to rank 0, so it (rank 0) prints the 
“Here A” message (and all the other ranks print “Here A” as well).
  2.  All ranks get to the barrier, and then all of them print the “Here B” 
message afterwards.



Am I correct on the statements (1) and (2) above?



Thanks,



Ernesto.


Schlumberger-Private

Reply via email to