On Tue, Jan 15, 2008 at 07:54:33PM -0500, Mark Kosmowski wrote:
> Dear Open-MPI Community:
> 
> I have a 3 node cluster, each a dual opteron workstation running
> OpenSUSE 10.1 64-bit.  The node names are LT, SGT and PFC.  When I
> start an mpirun job from either SGT or PFC, things work as they are
> supposed to.  However, if I start the same job from LT, the jobs hangs
> at SGT - this was confirmed by mpirun --np 6 --hostfile <correct
> hostfile for the three nodes> hostname, which gives only LT; LT; PFC;
> PFC (and then hangs) when started from LT (this same command started
> from either of the other nodes give two of each of the three hostnames
> and terminates normally).  The nfs share drive is physically located
> on LT.
> 
> I have been using ssh to get to either SGT or PFC from a terminal
> opened originally on LT to run jobs.  I can ssh from any node to any
> other node.
> 
> I have attached a gzipped tar archive of the three ifconfig results
> (for each node) and the results of ompi_info --all command as
> requested in the "Getting Help" section.  I was unable to locate a
> config.log file in the shared ompi directory.
> 
> Any assistance on this matter would be appreciated,
> 
> Mark E. Kosmowski

I'd posted a message earlier about intermittent hangs -- perhaps it's
the same issue.  If you run a hundred instances or so of "mpirun --np 6
--hostfile hostfile uptime", from SGT or PFC, do you notice any hangs?

Barry Rountree

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to