Hi,

A small update.
My colleague made a mistake and there is no arithmetic performance issue. Sorry for bothering you.

Nevertheless, one can observed some differences between MPICH and OpenMPI from 25% to 100% depending on the options we are using into our software. Tests are lead on a single SGI node on 6 or 12 processes, and thus, I am focused on the sm option.

So, I have two questions:
1/ does the option--mca mpool_sm_max_size=XXXX can change something (I am wondering if the value is not too small and, as consequence, a set of small messages is sent instead of a big one) 2/ is there a difference between --mca btl tcp,sm,self and --mca btl self,sm,tcp (or not put any explicit mca option)?

Best regards,
Mathieu.

On 12/05/2010 06:10 PM, Eugene Loh wrote:
Mathieu Gontier wrote:

  Dear OpenMPI users

I am dealing with an arithmetic problem. In fact, I have two variants of my code: one in single precision, one in double precision. When I compare the two executable built with MPICH, one can observed an expected difference of performance: 115.7-sec in single precision against 178.68-sec in double precision (+54%).

The thing is, when I use OpenMPI, the difference is really bigger: 238.5-sec in single precision against 403.19-sec double precision (+69%).

Our experiences have already shown OpenMPI is less efficient than MPICH on Ethernet with a small number of processes. This explain the differences between the first set of results with MPICH and the second set with OpenMPI. (But if someone have more information about that or even a solution, I am of course interested.) But, using OpenMPI increases the difference between the two arithmetic. Is it the accentuation of the OpenMPI+Ethernet loss of performance, is it another issue into OpenMPI or is there any option a can use?

It is also unusual that the performance difference between MPICH and OMPI is so large. You say that OMPI is slower than MPICH even at small process counts. Can you confirm that this is because MPI calls are slower? Some of the biggest performance differences I've seen between MPI implementations had nothing to do with the performance of MPI calls at all. It had to do with process binding or other factors that impacted the computational (non-MPI) performance of the code. The performance of MPI calls was basically irrelevant.

In this particular case, I'm not convinced since neither OMPI nor MPICH binds processes by default.

Still, can you do some basic performance profiling to confirm what aspect of your application is consuming so much time? Is it a particular MPI call? If your application is spending almost all of its time in MPI calls, do you have some way of judging whether the faster performance is acceptable? That is, is 238 secs acceptable and 403 secs slow? Or, are both timings unacceptable -- e.g., the code "should" be running in about 30 secs.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to