You should:

- do N warmup barriers
- start the timers
- do M barriers (M should be a lot)
- stop the timers
- divide the time by M

Benchmarking is tricky to get right. 

Sent from my PDA. No type good. 

On Feb 23, 2011, at 11:54 PM, "Li Zuwei" <lzu...@dso.org.sg> wrote:

> Dear Users,
> 
> I'm measuring barrier synchronization performance on the v1.5.1 build of 
> OpenMPI. I am currently trying to measure synchronization performance on a 
> single node, with 5 processes. I'm getting pretty weak results as follows:
> 
> Testing procedure - initialize the timer at the start of the barrier, stop 
> the timer when the process break from the barrier. Cycle through N number of 
> times and calculate the average.
> 
> 1 Node 5 processes: 299.38ms
> 1 Node 7 processes: 513.95ms
> 1 Node 10 processes: 749.94ms
> 
> I am wondering if this is the expected performance on a single nodes. I 
> presume Open MPI automatically uses Shared Memory for barrier synchronization 
> on a single node which I think should be able to provide better performance 
> when running on a single node. Is there a way to determine what transport 
> layer I am using and I would greatly appreciate tips on how can I tune this 
> performance.
> 
> Regards,
> Zuwei
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to