On Tue, Jan 15, 2008 at 07:54:33PM -0500, Mark Kosmowski wrote: > Dear Open-MPI Community: > > I have a 3 node cluster, each a dual opteron workstation running > OpenSUSE 10.1 64-bit. The node names are LT, SGT and PFC. When I > start an mpirun job from either SGT or PFC, things work as they are > supposed to. However, if I start the same job from LT, the jobs hangs > at SGT - this was confirmed by mpirun --np 6 --hostfile <correct > hostfile for the three nodes> hostname, which gives only LT; LT; PFC; > PFC (and then hangs) when started from LT (this same command started > from either of the other nodes give two of each of the three hostnames > and terminates normally). The nfs share drive is physically located > on LT. > > I have been using ssh to get to either SGT or PFC from a terminal > opened originally on LT to run jobs. I can ssh from any node to any > other node. > > I have attached a gzipped tar archive of the three ifconfig results > (for each node) and the results of ompi_info --all command as > requested in the "Getting Help" section. I was unable to locate a > config.log file in the shared ompi directory. > > Any assistance on this matter would be appreciated, > > Mark E. Kosmowski
I'd posted a message earlier about intermittent hangs -- perhaps it's the same issue. If you run a hundred instances or so of "mpirun --np 6 --hostfile hostfile uptime", from SGT or PFC, do you notice any hangs? Barry Rountree > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users