You should: - do N warmup barriers - start the timers - do M barriers (M should be a lot) - stop the timers - divide the time by M
Benchmarking is tricky to get right. Sent from my PDA. No type good. On Feb 23, 2011, at 11:54 PM, "Li Zuwei" <lzu...@dso.org.sg> wrote: > Dear Users, > > I'm measuring barrier synchronization performance on the v1.5.1 build of > OpenMPI. I am currently trying to measure synchronization performance on a > single node, with 5 processes. I'm getting pretty weak results as follows: > > Testing procedure - initialize the timer at the start of the barrier, stop > the timer when the process break from the barrier. Cycle through N number of > times and calculate the average. > > 1 Node 5 processes: 299.38ms > 1 Node 7 processes: 513.95ms > 1 Node 10 processes: 749.94ms > > I am wondering if this is the expected performance on a single nodes. I > presume Open MPI automatically uses Shared Memory for barrier synchronization > on a single node which I think should be able to provide better performance > when running on a single node. Is there a way to determine what transport > layer I am using and I would greatly appreciate tips on how can I tune this > performance. > > Regards, > Zuwei > > > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users