Hi,

we were used to do oversubscribing just to do code validation in nightly automated parallel runs of our code.

I just compiled openmpi 1.8.3 and launched the whole suit of sequential/parallel tests and noticed a *major* slowdown in oversubscribed parallel tests with 1.8.3 compared to 1.6.5.

For example, on my computer (2 cpu), a validation test of 64 processes launched with 1.8.3 took 1500 seconds (~29 minutes) to execute, while the very same test compiled with 1.6.5 took only 7.4 seconds!

To have this result with 1.6.5 we had to set the variable "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in 1.8.3 when I launch more processes than number of core in my computer, even if it is still mentioned to work (see http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded). However, when I launch with fewer processes than number of core, then it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the same behavior in 1.6.5.

I tried to launch with a host file like this:

localhost slots=2

but it changed nothing...

What do I do wrong?

Is it possible to retrieve "performances" of 1.6.5 for oversubscription?

Is there a compilation option that I have to enable in 1.8.3?

Here are the config.log and "ompi_info --all" files for both versions of mpi:

http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz

Thanks,

Eric




Reply via email to