The error handler wouldn't be called in that situation - we simply abort the job. We expect to provide that integration in something like the 1.7.4 release milestone.
On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo <etcama...@inf.ufpr.br> wrote: > Hi All, > > I was looking for posts about fault tolerant in MPI and I found the post > below: > > http://www.open-mpi.org/community/lists/users/2012/06/19658.php > > I am trying to understand all work about failures detection present in > open-mpi. So, I began with a simple application, a ring application > (ring.c) , to understand errors handlers. But, it seems me that didn't > work, why not? (the code is below) > > The application (the process) was running in the same machine with the > following code line: > > $ mpiexec -n 4 ring > > While the ring application was running, one of the process was killed. > So, the entire application stopped (ok until here), but didn't show me the > error message. The line if(error != MPI_SUCCESS) should not worked? > > I am using the mpiexec (OpenRTE) 1.6.5. > > Thanks in advance, > > Edson > > ----------------------------------------------- > #include <stdio.h> > #include <mpi.h> > #include <time.h> > > int main( int argc, char *argv[] ) > { > int rank, size; > int n = 0; > int tag = 0; > int error; > int root = 0; > int next, previous; > double start = 0; > double finish = 0; > > MPI_Status status; > > MPI_Init( &argc, &argv ); > MPI_Comm_size(MPI_COMM_WORLD, &size); > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > // error handler > MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN); > > do { > next = (rank + 1) % (size); > n++; > > if(rank != 0){ > previous = (rank - 1); > }else{ > previous = size - 1; > } > > if (rank == root) { > > error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD ); > > //if a error happens print the message > if(error != MPI_SUCCESS){ > printf("error"); > } > > error = MPI_Recv( &n, 1, MPI_INT, previous, tag, > MPI_COMM_WORLD, &status ); > > //if a error happens print the message > if(error != MPI_SUCCESS){ > printf("error"); > } > } > else { > > error = MPI_Recv( &n, 1, MPI_INT, previous, tag, > MPI_COMM_WORLD, &status ); > > //if a error happens print the message > if(error != MPI_SUCCESS){ > printf("error"); > } > > error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD ); > > //if a error happens print the message > if(error != MPI_SUCCESS){ > printf("error"); > } > } > printf( "Process %d got %d\n", rank, n ); > > // wait a bit > start = MPI_Wtime(); > finish = start; > > while ( (finish - start) < 1 ){ > finish = MPI_Wtime(); > } > > } while (n < 100); > > MPI_Finalize(); > return 0; > } > ---------------------------- > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users