Re: [OMPI users] Proper way to throw an error to all nodes?

Robert Kubrick Mon, 18 Aug 2008 20:17:32 -0400

A question related to an old thread:

in case of solution 2), how do you broadcast 'flags' to the slaves ifthey're processing asynchronous data? I understand MPI_Bcast is acollective operation requiring all processes in a communicator tocall it before it completes. If the slaves are processing a number ofdata events in a continuous loop, the only solution I see is to senda special exit message from the master through MPI_Send.


Or is there a non-collective broadcast function I am missing?

On Jun 4, 2008, at 2:51 PM, Jeff Squyres wrote:

Yes -- MPI_Abort is the simplest way to get them all to die.  But
you'll also get error message(s) from OMPI.  So you have [at least] 2
options:

1. Exit with MPI error

-----
   if (rank == process_who_does_the_checking && !exists(filename)) {
      print("bad!");
      MPI_Abort(MPI_COMM_WORLD);
   }
-----

2. Exit with your own error; MPI finalizes cleanly

-----
   file_exists = 1;
   if (rank == process_who_does_the_checking && !exists(filename)) {
      print("bad!");
      file_exists = 0;
   }
   MPI_Bcast(&file_exists, 1, MPI_INT, process_who_does_the_checking,
MPI_COMM_WORLD);
   if (!file_exists) {
      MPI_Finalize();
      exit(1);
   }
-----

There's oodles of variants on this, of course, but you get the general
idea.



On Jun 3, 2008, at 11:00 PM, David Singleton wrote:


This is exactly what MPI_Abort is for.

David

Terry Frankcombe wrote:

Calling MPI_Finalize in a single process won't ever do what youwant.

You need to get all the processes to call MPI_Finalize for the end
to be
graceful.

What you need to do is have some sort of special message to tell
everyone to die.  In my codes I have a rather dynamic master-slave
model
with flags being broadcast by the master process to tell the slaves
what
to expect next, so it's easy for me to send out an "it's all over,
please kill yourself" message.  For a more rigid communication
pattern
you could embed the die message in the data: something like if the
first

element of the received data is negative, then that's the signthings

have gone south and everyone should stop what they're doing and
MPI_Finalize.  The details depend on the details of your code.

Presumably you could also set something up using tags and message
polling.

Hope this helps.


On Tue, 2008-06-03 at 19:57 +0900, 8mj6tc...@sneakemail.com wrote:

So I'm working on this program which has many ways it might
possibly die
at runtime, but one of them that happens frequently is the user
types a
wrong (non-existant) filename on the command prompt. As it is now,
the
node looking for the file notices the file doesn't exist and tries
to
terminate the program. It tries to call MPI_Finalize(), but the
other
nodes are all waiting for a message from the node doing the file
reading, so MPI_Finalize waits forever until the user realizes the
job
isn't doing anything and terminates it manually.

So, my question is: what's the "correct" graceful way to handle

situations like this? Is there some MPI function which canbasically

throw an exception to all other nodes telling them bail out now?
Or is
correct behaviour just to have the node that spotted the error die
quietly and wait for the others to notice?

Thanks for any suggestions!


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Proper way to throw an error to all nodes?

Reply via email to