Hi,
A small update.
My colleague made a mistake and there is no arithmetic performance
issue. Sorry for bothering you.
Nevertheless, one can observed some differences between MPICH and
OpenMPI from 25% to 100% depending on the options we are using into our
software. Tests are lead on a single SGI node on 6 or 12 processes, and
thus, I am focused on the sm option.
So, I have two questions:
1/ does the option--mca mpool_sm_max_size=XXXX can change something (I
am wondering if the value is not too small and, as consequence, a set of
small messages is sent instead of a big one)
2/ is there a difference between --mca btl tcp,sm,self and --mca btl
self,sm,tcp (or not put any explicit mca option)?
Best regards,
Mathieu.
On 12/05/2010 06:10 PM, Eugene Loh wrote:
Mathieu Gontier wrote:
Dear OpenMPI users
I am dealing with an arithmetic problem. In fact, I have two variants
of my code: one in single precision, one in double precision. When I
compare the two executable built with MPICH, one can observed an
expected difference of performance: 115.7-sec in single precision
against 178.68-sec in double precision (+54%).
The thing is, when I use OpenMPI, the difference is really bigger:
238.5-sec in single precision against 403.19-sec double precision
(+69%).
Our experiences have already shown OpenMPI is less efficient than
MPICH on Ethernet with a small number of processes. This explain the
differences between the first set of results with MPICH and the
second set with OpenMPI. (But if someone have more information about
that or even a solution, I am of course interested.)
But, using OpenMPI increases the difference between the two
arithmetic. Is it the accentuation of the OpenMPI+Ethernet loss of
performance, is it another issue into OpenMPI or is there any option
a can use?
It is also unusual that the performance difference between MPICH and
OMPI is so large. You say that OMPI is slower than MPICH even at
small process counts. Can you confirm that this is because MPI calls
are slower? Some of the biggest performance differences I've seen
between MPI implementations had nothing to do with the performance of
MPI calls at all. It had to do with process binding or other factors
that impacted the computational (non-MPI) performance of the code.
The performance of MPI calls was basically irrelevant.
In this particular case, I'm not convinced since neither OMPI nor
MPICH binds processes by default.
Still, can you do some basic performance profiling to confirm what
aspect of your application is consuming so much time? Is it a
particular MPI call? If your application is spending almost all of
its time in MPI calls, do you have some way of judging whether the
faster performance is acceptable? That is, is 238 secs acceptable and
403 secs slow? Or, are both timings unacceptable -- e.g., the code
"should" be running in about 30 secs.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users