Hi folks,

I have a really strange problem: a super simple MPI test program (see
below) runs successfully for all users when executed on 4 processes in
1 node, but hangs for user A and runs successfully for user B when
executed on 8 processes in 2 nodes. The executable used is the same
and the appfile used is also the same for user A and user B. Both
users launch it by

mpirun --app appfile

where the content of 'appfile' is

-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test

for the single node run with 4 processes and is replaced by

-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test

for the 2-node run with 8 processes. Just to recap, the single node
run works for both user A and user B, but the 2-node run only works
for user B and it hangs for user A. It does respond to Ctrl-C though.
Both users use bash, have set up passwordless ssh, are able to ssh
from node1 to node2 and back, have the same PATH and use the same
'mpirun' executable.

At this point I've run out of ideas what to check and debug because
the setups look really identical. The test program is simply

#include <stdio.h>
#include <mpi.h>

int main( int argc, char **argv )
{
   int node;

   MPI_Init( &argc, &argv );
   MPI_Comm_rank( MPI_COMM_WORLD, &node );

   printf( "First Hello World from Node %d\n", node );
   MPI_Barrier( MPI_COMM_WORLD );
   printf( "Second Hello World from Node %d\n",node );

   MPI_Finalize(  );

   return 0;
}


I also asked both users to compile the test program separately, and
the resulting executable 'test' is the same for both indicating again
that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
1.5. and the interconnect is infiniband.

I've really run out of ideas what else to compare between user A and B.

Thanks for any hints,
Daniel





-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown



-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown

Reply via email to