Hello, I am studying the optimization strategy when the number of communication functions in a code is high.
My courses on MPI say two things for optimization which are contradictory : 1*) You have to use temporary message copy to allow non-blocking sending and uncouple the sending and receiving 2*) Avoid using temporary message copy because the copy will add extra cost on execution time. And then, we are adviced to do : - replace MPI_SEND by MPI_SSEND (synchroneous blocking sending) : it is said that execution is divided by a factor 2 - use MPI_ISSEND and MPI_IRECV with MPI_WAIT function to synchronize (synchroneous non-blocking sending) : it is said that execution is divided by a factor 3 So what's the best optimization ? Do we have to use temporary message copy or not and if yes, what's the case for ? Thanks