Sorry for the delay in replying; my INBOX has become a disaster recently. More below.

On Sep 14, 2009, at 5:08 AM, Sam Verboven wrote:

Dear All,

I'm having the following problem. If I execute the exact same
application using both openmpi and mpich2, the former takes more than
2 times as long. When we compared the ganglia output we could see that
openmpi generates more than 60 percent System CPU whereas mpich2 only
has about 5, the remaining cycles all going to User CPU. This about
explains the slowdown: when using openmpi we lose more than half the
cycles to operating system overhead. We would very much like to know
why our openmpi implementation incurs such a dramatic overhead.

The only reason I could think of myself is the fact that we use
bridged network interfaces on the cluster. Openmpi would not run
properly until we specified the mca command to use the br0 interface
instead of the physical eth0. Mpich2 does not require any extra
parameters.


What did Open MPI did when you did not specify the use br0?

I assume that br0 is a combination of some other devices, like eth0 and eth1? If so, what happens if you "btl_tcp_if_include eth0,eth1" instead of br0?

The calculations themselves are done using fortran. The operating
system is ubuntu 9.04, we have 14 dual quad core nodes and both
openmpi and mpich2 are compiled from source without any configure
options.

Full command OpenMPI:
mpirun.openmpi --mca btl_tcp_if_include br0 --prefix
/usr/shares/mpi/openmpi -hostfile hostfile -np 224
/home/arickx/bin/Linux/F_me_Kl1l2_3cl_mpi_2

Full command Mpich2:
mpiexec.mpich2 -machinefile machinefile -np 113
/home/arickx/bin/Linux/F_me_Kl1l2_3cl_mpi_2


I notice that you're running almost 2x the number of processes for Open MPI as MPICH2 -- does increasing the number of processes increase the problem size, or have some other effect on overall run-time?

--
Jeff Squyres
jsquy...@cisco.com

Reply via email to