Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-14 Thread Jody Klymak
Hi All, Just to polish this thread off To make openmpi work on my OS X 10.5 machine I need only: ./configure --prefix=/Network/Xgrid/openmpi make make install I then edited /Network/Xgrid/openmpi/etc/openmpi-mca-params.conf and added # set ports so that they are more valid than the

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-13 Thread Jody Klymak
On Aug 12, 2009, at 19:09 PM, Ralph Castain wrote: Hmmm...well, I'm going to ask our TCP friends for some help here. Meantime, I do see one thing that stands out. Port 4 is an awfully low port number that usually sits in the reserved range. I checked the /etc/services file on my Mac, and

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-13 Thread Jeff Squyres
Agreed -- ports 4 and 260 should be in the reserved ports range. Are you running as root, perchance? On Aug 12, 2009, at 10:09 PM, Ralph Castain wrote: Hmmm...well, I'm going to ask our TCP friends for some help here. Meantime, I do see one thing that stands out. Port 4 is an awfully

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Ralph Castain
Hmmm...well, I'm going to ask our TCP friends for some help here. Meantime, I do see one thing that stands out. Port 4 is an awfully low port number that usually sits in the reserved range. I checked the / etc/services file on my Mac, and it was commented out as unassigned, which should

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Gus Correa
Hi Jody Jody Klymak wrote: On Aug 11, 2009, at 18:55 PM, Gus Correa wrote: Did you wipe off the old directories before reinstalling? Check. I prefer to install on a NFS mounted directory, Check Have you tried to ssh from node to node on all possible pairs? check - fixed this

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
On Aug 12, 2009, at 12:46 PM, Jody Klymak wrote: So I think ranks 0 and 2 are on xserve02 and rank 1 is on xserve01, Should read xserve03, -- Jody Klymak http://web.uvic.ca/~jklymak/

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
On Aug 12, 2009, at 12:31 PM, Ralph Castain wrote: Well, it is getting better! :-) On your cmd line, what btl's are you specifying? You should try -mca btl sm,tcp,self for this to work. Reason: sometimes systems block tcp loopback on the node. What I see below indicates that inter-node

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Ralph Castain
Well, it is getting better! :-) On your cmd line, what btl's are you specifying? You should try -mca btl sm,tcp,self for this to work. Reason: sometimes systems block tcp loopback on the node. What I see below indicates that inter-node comm was fine, but the two procs that share a node couldn't

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
Hi Ralph, That gives me something more to work with... On Aug 12, 2009, at 9:44 AM, Ralph Castain wrote: I believe TCP works fine, Jody, as it is used on Macs fairly widely. I suspect this is something funny about your installation. One thing I have found is that you can get this error

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Ralph Castain
I believe TCP works fine, Jody, as it is used on Macs fairly widely. I suspect this is something funny about your installation. One thing I have found is that you can get this error message when you have multiple NICs installed, each with a different subnet, and the procs try to connect across

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
On Aug 11, 2009, at 18:55 PM, Gus Correa wrote: Did you wipe off the old directories before reinstalling? Check. I prefer to install on a NFS mounted directory, Check Have you tried to ssh from node to node on all possible pairs? check - fixed this today, works fine with the

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-11 Thread Gus Correa
Hi Jody Jody Klymak wrote: On Aug 11, 2009, at 17:35 PM, Gus Correa wrote: You can check this, say, by logging in to each node and doing /usr/local/openmpi/bin/ompi_info and comparing the output. Yep, they are all the same 1.3.3, SVN r21666, July 14th 2009. Did you wipe off the old

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-11 Thread Jody Klymak
On Aug 11, 2009, at 17:35 PM, Gus Correa wrote: You can check this, say, by logging in to each node and doing /usr/ local/openmpi/bin/ompi_info and comparing the output. Yep, they are all the same 1.3.3, SVN r21666, July 14th 2009. What about passwords? ssh from server to node is

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-11 Thread Gus Correa
Hi Jody Are you sure you have the same OpenMPI version installed on /usr/local/openmpi on *all* nodes? The fact that the programs run on the xserver0, but hang when you try xserver0 and xserver1 together suggest some inconsistency in the runtime environment, which may come from different

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-11 Thread Ralph Castain
I can't speak to the tcp problem, but the following: [xserve02.local:43625] [[28627,0],2] orte:daemon:send_relay - recipient list is empty! is not an error message. It is perfectly normal operation. Ralph On Aug 11, 2009, at 1:54 PM, Jody Klymak wrote: Hello, On Aug 11, 2009, at 8:15

[OMPI users] tcp connectivity OS X and 1.3.3

2009-08-11 Thread Jody Klymak
Hello,On Aug 11, 2009, at  8:15 AM, Ralph Castain wrote:You can turn off those mca params I gave you as you are now past that point. I know there are others that can help debug that TCP btl error, but they can help you there.Just to eliminate the mitgcm from the debugging I compiled