Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-04 Thread Ralph Castain
I'll let Tom suggest a solution for the psm error, but you really need to remove those thread-related config params. OMPI isn't really thread safe at this point. On Aug 4, 2013, at 6:26 PM, RoboBeans wrote: > Hi Tom, > > As per your suggestion, i tried > > ./configure --with-psm --prefix=/o

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-04 Thread RoboBeans
Hi Tom, As per your suggestion, i tried *./configure --with-psm --prefix=/opt/openmpi-1.7.2 --enable-event-thread-support --enable-opal-multi-threads --enable-orte-progress-threads --enable-mpi-thread-multiple* but I am getting this error: --- MCA component mtl:psm (m4 configuration macro)

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-04 Thread Elken, Tom
On 8/3/13 7:09 PM, RoboBeans wrote: On first 7 nodes: [mpidemo@SERVER-3 ~]$ ofed_info | head -n 1 OFED-1.5.3.2: On last 4 nodes: [mpidemo@sv-2 ~]$ ofed_info | head -n 1 -bash: ofed_info: command not found [Tom] This is a pretty good clue that OFED is not installed on the last 4 nodes. You shou

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread Ralph Castain
Try adding "-mca btl sm,self,tcp" to your cmd line. Does everything work then? I'm thinking the problem is that we detect something not quite right about the ofed installation and abort, but earlier versions of OMPI may have just warned and continued by running TCP instead. IIRC, some users comp

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread RoboBeans
On issuing ibhosts command I can see this: *# ibhosts | sort* Ca: 0x00228870a432 ports 2 "sv-2 qib0" Ca: 0x00228870a47c ports 2 "sv-3 qib0" Ca: 0x00228870a4a8 ports 2 "sv-1 qib0" Ca: 0x00228877ca2c ports 1 "@ HCA-1" Ca: 0x00228877d7f4 ports 1 "SERVER-14 HC

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread RoboBeans
On first 7 nodes: *[mpidemo@SERVER-3 ~]$ ofed_info | head -n 1* OFED-1.5.3.2: *[mpidemo@SERVER-3 ~]$ which ofed_info* /usr/bin/ofed_info On last 4 nodes: *[mpidemo@sv-2 ~]$ ofed_info | head -n 1* -bash: ofed_info: command not found *[mpidemo@sv-2 ~]$ which ofed_info* /usr/bin/which: no ofed_i

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread Ralph Castain
Are the ofed versions the same across all the machines? I would suspect that might be the problem. On Aug 3, 2013, at 4:06 PM, RoboBeans wrote: > Hi Ralph, I tried using 1.5.4, 1.6.5 and 1.7.2 (compiled from source code) > with no configuration arguments but I am facing the same issue. When I

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread RoboBeans
Hi Ralph, I tried using 1.5.4, 1.6.5 and 1.7.2 (compiled from source code) with no configuration arguments but I am facing the same issue. When I run a job using 1.5.4 (installed using yum), I get warnings but it doesn't affect my output. Example of warning that I get: sv-2.7960ipath_userinit

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread Ralph Castain
Hmmm...strange indeed. I would remove those four configure options and give it a try. That will eliminate all the obvious things, I would think, though they aren't generally involved in the issue shown here. Still, worth taking out potential trouble sources. What is the connectivity between SER

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread RoboBeans
Thanks for looking into in Ralph. I modified the hosts file but I am still getting the same error. Any other pointers you can think of? The difference between this 1.7.2 installation and 1.5.4 is that I installed 1.5.4 using yum and for 1.7.2, I used the source code and configured with *--enabl

Re: [OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread Ralph Castain
It looks like SERVER-2 cannot talk to your x.x.x.100 machine. I note that you have some entries at the end of the hostfile that I don't understand - a list of hosts that can be reached? And I see that your x.x.x.22 machine isn't on it. Is that SERVER-2 by chance? Our hostfile parsing changed be

[OMPI users] ERROR: At least one pair of MPI processes are unable to reach each other for MPI communications.

2013-08-03 Thread RoboBeans
Hello everyone, I have installed openmpi 1.5.4 on 11 node cluster using "yum install openmpi openmpi-devel" and everything seems to be working fine. For testing I am using this test program //** *$ cat test.cpp* #include #incl