Hello everybody, this is my first post. I needed to analyze the communication among nodes in a CFD code, so I used vtrun from mpiexec. Next, I dumped the data (otfdump) and summed up the messages volumes for Send and Rec. lines My results astonished me - the total Sent <> total Received. Below I present a very small, 4 processes problem but it occurs in every run for any number of processes: This is the sum for SendMessage - first column is sender, second is rec, 3rd the volume in bytes.
0 0 0 0 1 33575534 0 2 17178610 0 3 17881624 1 0 75900050 1 1 0 1 2 9510508 1 3 20961830 2 0 39807134 2 1 9937288 2 2 0 2 3 30328578 3 0 32415748 3 1 33226154 3 2 55062442 3 3 0 For ReceiveMessage - first column is rec, second sender, 3rd the volume: 0 0 0 0 1 57682570 0 2 30912474 0 3 28154684 1 0 43260014 1 1 0 1 2 9937288 1 3 37073342 2 0 21455674 2 1 9510508 2 2 0 2 3 62425238 3 0 20559492 3 1 19374170 3 2 27494694 3 3 0 Comparing, you can see that reported volumes are perfect between ranks 1 and 2 both directions only. But for others? I correlated the data with Vampir for this 4-proc case and it shows agg. message volume partially from SendMessages, partially from ReciveMessages. Below the table, data in MiB, in brackets you have ident. or the Send or Rec part I got from OTF. p0 p1 p2 p3 p0 32.02(S) 16.383(S) 17.053(S) p1 55.01(R) 9.07(R/S) 18.477(R) p2 29.48(R) 9.477(R/S) 26.221(R) p3 26.85(R) 31.687(S) 52.512(S) Can anybody explain this, please? Probably I do something wrong or I do not understand how to interpret the data in otf. Can otfdump work wrong? Or Vampir? Best regards jaross