Hi,
we were used to do oversubscribing just to do code validation in nightly
automated parallel runs of our code.
I just compiled openmpi 1.8.3 and launched the whole suit of
sequential/parallel tests and noticed a *major* slowdown in
oversubscribed parallel tests with 1.8.3 compared to 1.6.5.
For example, on my computer (2 cpu), a validation test of 64 processes
launched with 1.8.3 took 1500 seconds (~29 minutes) to execute, while
the very same test compiled with 1.6.5 took only 7.4 seconds!
To have this result with 1.6.5 we had to set the variable
"OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
1.8.3 when I launch more processes than number of core in my computer,
even if it is still mentioned to work (see
http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded). However,
when I launch with fewer processes than number of core, then it is
faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the same
behavior in 1.6.5.
I tried to launch with a host file like this:
localhost slots=2
but it changed nothing...
What do I do wrong?
Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
Is there a compilation option that I have to enable in 1.8.3?
Here are the config.log and "ompi_info --all" files for both versions of
mpi:
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
Thanks,
Eric