On Aug 12, 2009, at 19:09 PM, Ralph Castain wrote:
Hmmm...well, I'm going to ask our TCP friends for some help here.
Meantime, I do see one thing that stands out. Port 4 is an awfully
low port number that usually sits in the reserved range. I checked
the /etc/services file on my Mac, and it was commented out as
unassigned, which should mean it was okay.
Still, that is an unusual number. The default minimum port number is
1024, so I'm puzzled how you wound up down there. Of course, could
just be an error in the print statement, but let's try moving it to
be safe? Set
-mca btl_tcp_port_min_v4 36900 -mca btl_tcp_port_range_v4 32
and see what happens.
What happens is that everything works now! Both connectivity_c and
the MITgcm. I haven't tried under torque yet, but lets declare an
openMPI victory at this point.
On Aug 13, 2009, at 8:28 AM, Jeff Squyres wrote:
Agreed -- ports 4 and 260 should be in the reserved ports range.
Are you running as root, perchance?
Errrr, no, but yes. My user account has admin privledges. A sloppy
workstation OS X habit I now regret propagating to my cluster. I'm
sorry to not mention it earlier as possibly relevant.
As a suggestion, btl_base_verbose could be mentioned as a good
debugging tool in the troubleshooting section of the FAQ. Its on the
page to do with tcp, which I admit I should have read as soon as I
realized there was a communication issue, but having it in the
troubleshooting section would be helpful too. i.e. maybe a more
erudite version of:
Checking connections between nodes:
Sometimes the configuration of a cluster makes it impossible for nodes
to communicate properly. To debug this it helps to include --mca
btl_base_verbose 30 as a command line argument (see http://www.open-mpi.org/faq/?category=tcp
for more information). The program example/connectivity_c.c is also
a useful minimal program for testing communication on the cluster.
Thanks again for everyone's help, particularly Ralph, Jeff and Gus.
Cheers, Jody