On 03/27/2014 01:53 PM, Sasso, John (GE Power & Water, Non-GE) wrote:
When a piece of software built against OpenMPI fails, I will see an
error referring to the rank of the MPI task which incurred the failure.
For example:

MPI_ABORT was invoked on rank 1236 in communicator MPI_COMM_WORLD

with errorcode 1.

Unfortunately, I do not have access to the software code, just the
installation directory tree for OpenMPI.  My question is:  Is there a
flag that can be passed to mpirun, or an environment variable set, which
would reveal the mapping of ranks to the hosts they are on?

I do understand that one could have multiple MPI ranks running on the
same host, but finding a way to determine which rank ran on what host
would go a long way in help troubleshooting problems which may be
central to the host.  Thanks!

In the past, I've done something like this (in C, though a similar thing would work well in Fortran/others)

#include <sys/utsname.h>
/* ... */
int debug = 1;
char *cpu_name;
struct utsname  uts;

/* ... later, after MPI_Init/MPI_Comm_rank/MPI_Comm_size .. */

uname(&uts);
cpu_name = uts.nodename;

if (debug==1) {
        printf("hostname=%s, I am rank %d\n", cpu_name,rank);
}




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615

Reply via email to