The error handler wouldn't be called in that situation - we simply abort the 
job. We expect to provide that integration in something like the 1.7.4 release 
milestone.


On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo <etcama...@inf.ufpr.br> 
wrote:

> Hi All,
> 
> I was looking for posts about fault tolerant in MPI and I found the post
> below:
> 
> http://www.open-mpi.org/community/lists/users/2012/06/19658.php
> 
> I am trying to understand  all work about failures detection present in
> open-mpi. So, I began with a simple application, a ring application
> (ring.c) , to understand errors handlers. But, it seems me that didn't
> work, why not? (the code is below)
> 
> The application (the process) was running in the same machine with the
> following code line:
> 
> $ mpiexec -n 4 ring
> 
> While the  ring application was running, one of the process was killed.
> So, the entire application stopped (ok until here), but didn't show me the
> error message. The line if(error != MPI_SUCCESS) should not worked?
> 
> I am using the mpiexec (OpenRTE) 1.6.5.
> 
> Thanks in advance,
> 
> Edson
> 
> -----------------------------------------------
> #include <stdio.h>
> #include <mpi.h>
> #include <time.h>
> 
> int main( int argc, char *argv[] )
> {
>    int rank, size;
>    int n = 0;
>    int tag = 0;
>    int error;
>    int root = 0;
>    int next, previous;
>    double start = 0;
>    double finish = 0;
> 
>    MPI_Status status;
> 
>    MPI_Init( &argc, &argv );
>    MPI_Comm_size(MPI_COMM_WORLD, &size);
>    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> 
>    // error handler
>    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
> 
>    do {
>        next = (rank + 1) % (size);
>        n++;
> 
>        if(rank != 0){
>            previous = (rank - 1);
>        }else{
>            previous = size - 1;
>        }
> 
>        if (rank == root) {
> 
>            error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );
> 
>            //if a error happens print the message
>            if(error != MPI_SUCCESS){
>                printf("error");
>            }
> 
>            error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
> MPI_COMM_WORLD, &status );
> 
>            //if a error happens print the message
>            if(error != MPI_SUCCESS){
>                printf("error");
>            }
>        }
>        else {
> 
>            error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
> MPI_COMM_WORLD, &status );
> 
>            //if a error happens print the message
>            if(error != MPI_SUCCESS){
>                printf("error");
>            }
> 
>            error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );
> 
>            //if a error happens print the message
>            if(error != MPI_SUCCESS){
>                printf("error");
>            }
>        }
>        printf( "Process %d got %d\n", rank, n );
> 
>        // wait a bit
>        start = MPI_Wtime();
>        finish = start;
> 
>        while ( (finish - start) < 1 ){
>            finish =  MPI_Wtime();
>        }
> 
>    } while (n < 100);
> 
>    MPI_Finalize();
>    return 0;
> }
> ----------------------------
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to