Make sure that the PATH really is identical between users -- especially for 
non-iteractive logins.  E.g.:

env

vs. 

ssh othernode env

Also check the LD_LIBRARY_PATH.


On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson <fetchin...@googlemail.com> 
wrote:

> Hi folks,
> 
> I have a really strange problem: a super simple MPI test program (see
> below) runs successfully for all users when executed on 4 processes in
> 1 node, but hangs for user A and runs successfully for user B when
> executed on 8 processes in 2 nodes. The executable used is the same
> and the appfile used is also the same for user A and user B. Both
> users launch it by
> 
> mpirun --app appfile
> 
> where the content of 'appfile' is
> 
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> 
> for the single node run with 4 processes and is replaced by
> 
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> 
> for the 2-node run with 8 processes. Just to recap, the single node
> run works for both user A and user B, but the 2-node run only works
> for user B and it hangs for user A. It does respond to Ctrl-C though.
> Both users use bash, have set up passwordless ssh, are able to ssh
> from node1 to node2 and back, have the same PATH and use the same
> 'mpirun' executable.
> 
> At this point I've run out of ideas what to check and debug because
> the setups look really identical. The test program is simply
> 
> #include <stdio.h>
> #include <mpi.h>
> 
> int main( int argc, char **argv )
> {
> int node;
> 
> MPI_Init( &argc, &argv );
> MPI_Comm_rank( MPI_COMM_WORLD, &node );
> 
> printf( "First Hello World from Node %d\n", node );
> MPI_Barrier( MPI_COMM_WORLD );
> printf( "Second Hello World from Node %d\n",node );
> 
> MPI_Finalize(  );
> 
> return 0;
> }
> 
> 
> I also asked both users to compile the test program separately, and
> the resulting executable 'test' is the same for both indicating again
> that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
> 1.5. and the interconnect is infiniband.
> 
> I've really run out of ideas what else to compare between user A and B.
> 
> Thanks for any hints,
> Daniel
> 
> 
> 
> 
> 
> -- 
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
> 
> 
> 
> -- 
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to