I took the best result from each version, that's why different algotithm
numbers were chosen.
I've studied the matter a bit further and here's what I got:
with openmpi 1.5.4 these are the average times:
/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile
hosts_all2all_4 -npernode
FWIW:
1. There were definitely some issues with binding to cores and process layouts
on Opterons that should be fixed in the 1.5.5 that was finally released today.
2. It is strange that the performance of barrier is so much different between
1.5.4 and 1.5.5. Is there a reason you were choosing
Pavel,
Mvapich implements multicore optimized collectives, which perform substantially
better than default algorithms.
FYI, ORNL team works on new high performance collectives framework for OMPI.
The framework provides significant boost in collectives performance.
Regards,
Pavel (Pasha) Shami
I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's why
I didn't use --bind-to-core. I checked and the usage of --bind-to-core
improved the result comparing to 1.5.4:
#repetitions t_min[usec] t_max[usec] t_avg[usec]
100084.9685.0885.02
So I gu
I don't see where you told OMPI to --bind-to-core. We don't automatically bind,
so you have to explicitly tell us to do so.
On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:
> Hello
>
> I'm doing some testing with IMB and dicovered a strange thing:
>
> Since I have a system with new AMD opte
Hello
I'm doing some testing with IMB and dicovered a strange thing:
Since I have a system with new AMD opteron 6276 processors I'm using
1.5.5rc3 since it supports binding to cores.
But when I run the barrier test form intel mpi benchmarks, the best I get
is:
#repetitions t_min[usec] t_max[us