Dear Gus,

Thanks for your help - your clue solved my problem!

The ultimate solution was to limit mpi communications to the local,
unrouted subnet. I made this the default behavior of all users of my
cluster by adding the following line to the bottom of my
$prefix/etc/openmpi-mca-params.conf file

btl_tcp_if_include = 10.0.0.0/8

Thanks again - what a relief!

Jed

On Fri, Jul 5, 2013, at 01:25 AM, Gustavo Correa wrote:
> Hi Jed 
> 
> You could try to select only ethernet interface that match your node's IP
> addresses,
> which seems to be en2.
> 
> The en1 interface seems to be an external IP. 
> Not sure about en3, but it is awkward that it has a 
> different IP than en2, but in the same subnet.
> I wonder if this may be the reason for the program hanging.
> 
> You may need to search all nodes ifconfig for a consistent set of
> interfaces/IP addresses,
> and tailor your mpiexec command line and your hostfile accordingly.
> 
> Say, something like this:
> 
> mpiexec -mca btl_tcp_if_include en2 -hostfile your_hostfile -np 43
> ./ring_c
> 
> See this FAQ (actually, all of them are very informative):
> http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> 
> I hope this helps,
> Gus Correa

Reply via email to