Hi Lars,
First off, I think Jeff makes some very good points.
If you still think your applications will benefit from yielding
instead of hogging the cpu,
you should probably try to use the parameter "mpi_show_mca_params".
This will give you a list of the mca parameters at runtime. This way
you can see
what the yield_when_idle-parameter really looks like at runtime. Ompi
seems to be
overriding the user some times. If yield_when_idle is disabled, I
think changes
has to be done to the open mpi code to make it yield.
Guess this didn't help at all, but at least you can check if you are
curious :)
Best regards,
Torje Henriksen
On Apr 13, 2008, at 1:51 PM, Jeff Squyres wrote:
Sorry for the delays in replying.
The central problem is that Open MPI is much more aggressive about its
message passing progress than LAM is -- it simply wasn't designed to
share well as a mechanism to get as high performance as possible.
mpi_yield_when_idle is most helpful only for certain transports that
actively use our event engine, such as the TCP device. Since you're
using the LAM sysv RPI, I assume you're using the TCP and shared
memory devices in OMPI, right? If you're using infiniband, for
example, the event engine is not called much because IB has its own
progression engine that is unrelated to OMPI's (and therefore we don't
invoke OMPI's much).
mpi_yield_when_idle is also only helpful if you're going into the MPI
layer often and making message passing progress (i.e., OMPI's event
engine is actively being invoked). Is this true for your application?
If mpi_yield_when_idle really doesn't help much, you may consider
sprinkling calls to sched_yield() in your codes to force the process
to yield the processor.
On Apr 4, 2008, at 2:30 AM, Lars Andersson wrote:
Hi,
I'm just in the progress of moving our application from LAM/MPI to
OpenMPI, mainly because OpenMPI makes it easier for a user to run
multiple jobs(MPI universa) simultaneously. This is useful if a user
wants to run smaller experiments without disturbing a large
experiment
running in the background). I've been evaluation the performance
using
a simple test, running on a hetrogenous cluster of 2 x dual core
Opteron machines, a couple of dual core P4 Xeon machines and a 8 core
Core2 machine. The main structure of the application is a master rank
distributing jobs packages to the rest of the ranks and collecting
the
results. We don't use any fancy MPI features but rather see it as an
efficient low-level tool for broadcasting and transferring data.
When a single user runs a job (fully subscribed nodes, but not
oversubscribed, i.e one process per cpu-core) on an otherwise
unloaded
cluster both LAM/MPI and OpenMPI average runtimes of about 1m33s
(OpenMPI has a slightly lower average).
When I start the same job simultaneously as two different users (thus
oversubscribing the nodes 2x) under LAM/MPI, the two jobs finish as
an
average time of about 3m, thus scaling very well (we use the -ssi rpi
sysv option to mpirun under LAM/MPI to avoid busy waiting).
When running the same second experiment under OpenMPI, the average
runtime jumps up to about 3m30s, with runs occasionally taking more
than 4 minutes to complete. I do use the "--mca mpi_yield_when_idle
1"
option to mpirun, but it doesn't seem to make any difference. I've
also tried setting the environment variable
OMPI_MCA_mpi_yield_when_idle=1, but still no change. ompi_info says:
ompi_info --param all all | grep yield
MCA mpi: parameter "mpi_yield_when_idle" (current
value: "1")
The cluster is used for various tasks, running MPI applications as
well as non-MPI applications, so we would like to avoid spending too
much cycles on busy-waiting. Any ideas on how to tweak OpenMPI to get
better performance and more cooperative behavior in this case would
be
greatly appreciated.
Cheers,
Lars
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users