Dear All,

I'm having the following problem. If I execute the exact same
application using both openmpi and mpich2, the former takes more than
2 times as long. When we compared the ganglia output we could see that
openmpi generates more than 60 percent System CPU whereas mpich2 only
has about 5, the remaining cycles all going to User CPU. This about
explains the slowdown: when using openmpi we lose more than half the
cycles to operating system overhead. We would very much like to know
why our openmpi implementation incurs such a dramatic overhead.

The only reason I could think of myself is the fact that we use
bridged network interfaces on the cluster. Openmpi would not run
properly until we specified the mca command to use the br0 interface
instead of the physical eth0. Mpich2 does not require any extra
parameters.

The calculations themselves are done using fortran. The operating
system is ubuntu 9.04, we have 14 dual quad core nodes and both
openmpi and mpich2 are compiled from source without any configure
options.

Full command OpenMPI:
mpirun.openmpi --mca btl_tcp_if_include br0 --prefix
/usr/shares/mpi/openmpi -hostfile hostfile -np 224
/home/arickx/bin/Linux/F_me_Kl1l2_3cl_mpi_2

Full command Mpich2:
mpiexec.mpich2 -machinefile machinefile -np 113
/home/arickx/bin/Linux/F_me_Kl1l2_3cl_mpi_2

I appreciate any help you may be able to provide.

Logs:
http://win.ua.ac.be/~svboven/config.log
http://win.ua.ac.be/~svboven/ompi_info.txt

Yours faithfully,
Sam Verboven

Reply via email to