Make sure that the PATH really is identical between users -- especially for non-iteractive logins. E.g.:
env vs. ssh othernode env Also check the LD_LIBRARY_PATH. On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson <fetchin...@googlemail.com> wrote: > Hi folks, > > I have a really strange problem: a super simple MPI test program (see > below) runs successfully for all users when executed on 4 processes in > 1 node, but hangs for user A and runs successfully for user B when > executed on 8 processes in 2 nodes. The executable used is the same > and the appfile used is also the same for user A and user B. Both > users launch it by > > mpirun --app appfile > > where the content of 'appfile' is > > -np 1 -host node1 -wdir /tmp/test ./test > -np 1 -host node1 -wdir /tmp/test ./test > -np 1 -host node1 -wdir /tmp/test ./test > -np 1 -host node1 -wdir /tmp/test ./test > > for the single node run with 4 processes and is replaced by > > -np 1 -host node1 -wdir /tmp/test ./test > -np 1 -host node1 -wdir /tmp/test ./test > -np 1 -host node1 -wdir /tmp/test ./test > -np 1 -host node1 -wdir /tmp/test ./test > -np 1 -host node2 -wdir /tmp/test ./test > -np 1 -host node2 -wdir /tmp/test ./test > -np 1 -host node2 -wdir /tmp/test ./test > -np 1 -host node2 -wdir /tmp/test ./test > > for the 2-node run with 8 processes. Just to recap, the single node > run works for both user A and user B, but the 2-node run only works > for user B and it hangs for user A. It does respond to Ctrl-C though. > Both users use bash, have set up passwordless ssh, are able to ssh > from node1 to node2 and back, have the same PATH and use the same > 'mpirun' executable. > > At this point I've run out of ideas what to check and debug because > the setups look really identical. The test program is simply > > #include <stdio.h> > #include <mpi.h> > > int main( int argc, char **argv ) > { > int node; > > MPI_Init( &argc, &argv ); > MPI_Comm_rank( MPI_COMM_WORLD, &node ); > > printf( "First Hello World from Node %d\n", node ); > MPI_Barrier( MPI_COMM_WORLD ); > printf( "Second Hello World from Node %d\n",node ); > > MPI_Finalize( ); > > return 0; > } > > > I also asked both users to compile the test program separately, and > the resulting executable 'test' is the same for both indicating again > that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is > 1.5. and the interconnect is infiniband. > > I've really run out of ideas what else to compare between user A and B. > > Thanks for any hints, > Daniel > > > > > > -- > Psss, psss, put it down! - http://www.cafepress.com/putitdown > > > > -- > Psss, psss, put it down! - http://www.cafepress.com/putitdown > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/