On Nov 7, 2007, at 7:43 PM, Murat Knecht wrote:

when MPI_Spawn cannot launch an application for whatever reason, the
entire job is cancelled with some message like the following.

That is correct; MPI states that the default error handler is MPI_ERRORS_ABORT.

Is there a way to handle this nicely, e.g. by throwing an exception? I

Sure; change the default error handler on the communicator in which you are using in the call to COMM_SPAWN.

I don't know if we have checked this particular code path to ensure that OMPI will be stable after this, but it might work...

understand, this does not work, when the job is first started with
mpirun, as there is no application yet to fall back on, but in case of a running application, it should be possible to simply inform it that the spawning request failed. Then the application could begin to handle the
error and terminate gracefully. I did enable C++ Exceptions btw, so I
guess this is not implemented. Is there a technical (e.g. architectural)
reason behind this, or simply a yet-to-be-added feature?

The MPI layer is written in C; it will not throw exceptions unless you use the MPI C++ bindings to enable the MPI::ERRORS_THROW_EXCEPTIONS error handler. Also be sure to use the right compiler flags to enable the C compiler to propagate C++ exceptions when you configure/build Open MPI via the --enable-cxx-exceptions flag (it's not enabled by default because it imposes a slight performance penalty).

--
Jeff Squyres
Cisco Systems

Reply via email to