Dear Users,

I'm measuring barrier synchronization performance on the v1.5.1 build of 
OpenMPI. I am currently trying to measure synchronization performance on a 
single node, with 5 processes. I'm getting pretty weak results as follows:

Testing procedure - initialize the timer at the start of the barrier, stop the 
timer when the process break from the barrier. Cycle through N number of times 
and calculate the average.

1 Node 5 processes: 299.38ms
1 Node 7 processes: 513.95ms
1 Node 10 processes: 749.94ms

I am wondering if this is the expected performance on a single nodes. I presume 
Open MPI automatically uses Shared Memory for barrier synchronization on a 
single node which I think should be able to provide better performance when 
running on a single node. Is there a way to determine what transport layer I am 
using and I would greatly appreciate tips on how can I tune this performance. 

Regards,
Zuwei






Reply via email to